ImgArchive Wiki : Image Storage

The imgArchive uses the date as a primary index for storing images in the archive. The archive uses folders to partition the stored images into manageable subsets. These subsets are partitioned into years and days.. All images using a date that fall in a particular year will be stored under folder of that year. Within a year folder the year is partitioned further into days, as there are 365 maximum days in the year then this is a manageable number of partitions within a year’s main partition.
So for any image within This file is used for configuring the SIA global options. These options can be overridden on the command line or environment variables so can be set as defaults. These options are especially useful for things such as the path to the current repository which needs to be known frequently but changes infrequently. To make the use of the options easier to deal with the options are split into sections. These sections have the following titles describing there content, listed below:

General
Logging
Network
System Folders
Master Archive
Derivative Archive
Archive Backup
Exif tool

This configuration file is found in the default location, for example: Windows will be "c:/ProgramData/IDK-Software/SIA"

or by setting the home path

These titles are described in more detail in the the following sections:

General

These options are general to the running of the application.

dry run

-dry_run (-dr) This allows you to run SIA in a mode that would carry out the required operations but makes no changes to the archive.

default = false.

useDatabase = true;

Logging

These are the options that relate to the logging functions. Logging message can go both to the log file and or the the console screen.

quiet

-quiet (-q) Requests that the SIA print only essential information while performing an operation. default = false

silent

-silent (-s) Suppresses all messages during run that are normally sent to the console. silent = false

Log Level

LogLevel The log level determines the granularity of the message that is logged, from fatal only (FATAL) to detailed debug (FINEST). The following values apply: SEVERE, WARNING, INFO, CONFIG, FINE, FINER, and FINEST. These log levels are hierarchically inclusive, which means that if you set a particular log level, such as INFO, the messages that have log levels above that level (SEVERE and WARNING) are also included. If you set the log level to the lowest level, FINEST, your output will include all the messages in the file. The log level can be set for both the console screen and the log file. The default setting is SUMMARY. default = "SUMMARY";

consoleLevel;

Network

Some of the operations carried in SIA can be very long time wise? In order to control and receive event messages as to how SIA is progressing two network connections may be set-up. The first is the command interface connection over TCP and the second is the event message interface that allows messages to be send to any connected client listening on a UDP connection.

eventsOn = false; // UDP events serverOn = false; tcpPortNum = 11000; udpPortNum = 11001; udpAddress = "127.0.0.1";

System Folders

These folders contain the archive. For example your original images will be kept in master folder pointed to by masterPath, your derivative images will be kept in the derivatives folder and is pointed to by the derivativePath. Because some of these folders can contain large numbers of images you may need to be selective of where these folders are on your computer. The Configuration file gives you the option to define where these folders will be on your computer. The other hand some of these folders will contain small numbers of files so probably do not need to be changed from the defaults.

hookPath;

this folder will contain any scripts you need to configure the archive. Example scripts can be found in this folder after installation.

toolsPath;

this folder can optionally contain tools that can enhance your archive example exif tools such as xxx can be used to enhance reading of the exif data in images. Another example of tools that can be used to enhance the archive would be tools such as image magic which enables ridges to be resized within the archive.

workspacePath;

the Workspace is where working copies the images are stored. This WorkspacePath points the folder that contains your working images. If this folder is not present in the configuration file then the default will be in your home directory under "SIA Workspace" for example, if your username is "Joe" then this folder be kept under this path c:/Users/Joe/SIA Workspace".

derivativePath;

masterPath;

sourcePath;

configPath;

tempPath;

logPath;

homePath;

System Path

This is the folder where the main system files are placed such as the primary index files. SystemPath

indexPath; historyPath; ExternalExifTool; ExternalCommandLine; ExifMapPath; MetadataTemplatePath; backupDestinationPath; masterViewPath; DatabasePath;

Master Archive

backup1; backup2; backup1Enabled = false; backup2Enabled = false;

Derivative Archive

backup1; backup2; backup1Enabled = false; backup2Enabled = false;

Archive Backup

backupMediaSize; fromDate; toDate; isFromDate = false; isToDate = false;

Exif Tool

Archive-Folder= Source-Folder= Logging-level=

date, can be found a year / day folder in which to place it. This also is a useful scheme in that most photographers will have a rough idea when an image was taken, so to then find images in the archive folders is not too difficult. The folders are named in a simple but consistent way. The year folder is simply the year, i.e. “2014”. Then the day folder is created using the year, month and day in the form , for example: “2014_08_12” will be 12 August 2014. Having the sequence year, month and day allows the folders to be automatically sorted in date order when viewed using a file browser such as File Explorer on windows or finder on the MAC.

Archive Date

The SIA as explanted above the archive uses a date to place images. The most significant dates are:

The date the image was captured in the camera
The file date on the image file
The archived date

The date the image was captured in the camera

Out of the three dates the capture date is probably the most useful as it places the image in the archive at the time it was taken. This date may not be found as when the images file is taken from the camera a new create date in the image of the date the image was moved from the camera to the computer storing the image. However where is a way of recovering the capture date but this will only be present in some images that have EXIF data? EXIF data is used by the camera manufactures to store information about an image in the image. One of the values stored is the capture date. By reading this date from the image file then the capture date can be recovered. This relies on the EXIF data being present if not then the capture date is lost.

The file date on the image file

The file image creation date will always be present as it is a property of the image file that always must exist. This date may be a large period away from the capture date as this is the date you transfer your captured image from the camera. If you do this frequently then the then the date will not be too far from the capture date.

The archived date

The date on which the image was place in the archive is the last significant date. Is date can be even further away in the future from the capture date? However if you add images into the archive at the same time as transferring from the camera then again you will not be too far out from the capture time. Default archive date By default SIA will try to find the earliest date or a date that is most likely to be the capture date. This is carried out by the following.

Read EXIF information from all images in the set (in the case of RAW and JPG pairs).
Find the create date from all image files.
Use todays date.

Reading EXIF information

SIA will read EXIF information in two ways: Using an Internal EXIF reader and (if configured) using an external reader. SIA will try both methods; the internal reader will be tried first then the external reader. All images in the set will be read for any EXIF and the results stored. This EXIF data will also be used to populate the metadata information for each image and set metadata. This is metadata that is shared across all images in the set and crucially the shared archive date. If one of the images is missing data, by default the information will be filled in from other images if available. By default in an image set with both a RAW and JPG images the information in the RAW will be the primary information source. However the JPG will most likely contain EXIF data.

=Find the create date from all image files= If the capture date cannot be found then the create date will need to be used. In a RAW/JPG image pair, the RAW file create-time will be used. =Use todays date= As a fall back and no other date can be found todays date will be used. This is normally a last resort.

Date checks

SIA will do a sanity test on the dates. if the dates do not make sense then the image will not be added to the archive and an error raised to flag there is a possible error in the dates. The Capture dates, the image file dates and todays date must be each be in the future I.e. the capture date must be the earliest date, the image file must be the same or later that the capture date and finally todays date must be the same or later than the image file date. Anomalies in the dates can be due to the time and date being incorrectly set on the camera or archive computer. The default archive date can be overridden in these cases. One common problem is that the camera date may not be set from the manufacture’s default.

Specifying and controlling the archive date

SIA has a number of options to specify and control the archive date. These can be used to override the default archive date selection. These are as follows: Use date options: These are used to override the archive date being selected. These are used if the default dates are somehow incorrect and a new date needs to be used. These options are as follows:

Use todays date
Use file date
Use date

Use todays date

This option forces the archive date to be todays date. All other dates are ignored. –use_date_today

Use file date

Use file date forces the archive data to be the image file date. In the case of a RAW/JPG pair then the RAW image file date will be used. –use_file_date

Use date

This option forces the archive date to be the date specified in the date argument. This can be used in the case were the capture date in the camera may be incorrect and the correct date to be explicitly specified. –use_date= For example: –use_date=2014.07.12 The above will set the archive date to the 12 day of July 2014. Image Extensions In the SIA configuration folder is the Image extension file. This file contains the file extensions that SIA uses to identify the media encoding type and determine which category type an image fits. There are three categories; 1. Displayable images (Picture types). 2. Raw image (RAW Types). 3. Images that may be displayable, but will not be raw images. These may have been generated by a camera or some third party image editor. Most image editors will save the edited image with a known extension. For example Photoshop uses the extension .psd” (Image types). Some images may fit in more than one category. The most often images the a camera generates are “JPG” files these will normally have the extension “.jpg” the other most often generated files that a camera generates are RAW images these are more tricky. Each camera manufacturer uses their own defined file extension to denote RAW image, in some cases more than one extension to denote RAW images For example Nikon uses “.NEF” to denote the file is a raw image file, Canon on the other hand use “*CRF” to denote a raw image file. The image extensions file is split into three sections, one for each category type. An example file is shown below. <!– Raw image (RAW Types)– > Dng=raw,Adobe Digital Negative NEF=raw,Nikon RAW CRF=raw,Canon RAW

Bmp=img,Bitmap Jpg=img,Joint Photograohic Experts Group Gif=pic,Graphics Interchange Format Jpg=img,Joint Photograohic Experts Group

As new extensions are created by camera manufactures and photo editor authors, then this file may be update to reflect the new changes. Managed and un-managed images SIA provides managed and un-manage versions of each image in the archive. Images for each type are stored to different separate areas. The managed image is stored within the archive. the un-managed images are stored in an area that is easily accessible to the user. When a new image is added to the archive it becomes managed. SIA will assign a unique number to the image and tracked within the archive and is normally not viewable by anyone. At the same time the image is added a viewable/editable version is also added in the un-managed area. The user of the archive is able to view and edit the un-managed images even delete them with no impact on the archive. This un-managed images can be refreshed from the managed images at any time. The managed version on the other hand should never be accessed by the user and must never be modified. If an un-managed image is edited and the changes need to be archived, then the images can be checked-in to the archive. When the image is checked-in a copy of the unmanaged image is placed in the managed area of the archive with a version number as part of the file name. This will prevent the original version of the image not to be over written and identify the new version. A unique number is then added and the new image is tracked within the archive. Backing up and Mirroring When a managed image is added to the archive all mirrors automatically updated with the new image. This ensures that more than one copy of the managed image exists in almost real-time. When backing-up to off-line media such as Blu-Ray as each unique image is added to the archive

Database support for metadata in SIA Once the metadata for each image has been captured some means of storing and accessing that data needs to be found. The metadata lends itself to a number of methods for both accessing and the storage of the metadata information. SIA supports three types of storage: 1. XML files 2. CSV files 3. SQLite database Each has advantages and disadvantages. Using the three methods can help to mitigate some of the disadvantages and support other storage methods and access methods. Each will be described in detail in the following sections. XML files XML stands for Extensible Mark-up Language (XML). This is a mark-up language that defines a set of rules for encoding documents in a format that is both human readable and machine readable i.e. readable by a computer. The standard for XML is defined in the XML 1.0 Specification produced by the World Wide Web Consortium (W3C). This format can be easily converted into HTML web pages. This conversion is common, so common that tools such as XSL parses have been developed to make the process easier. There disadvantage is that they are slow to search as each file will need to be opened, read and closed. An archive will have a large number of files to read thus a time consuming process compared to accessing a database to carry out a similar search. The XML Database This essentially is a collection of XML files placed in folders in a consistent way. This allows the XML files to be accessed effectively. Any software tool using the XML files will be able to do so by following the access rules. These rules are as follows: Each image has a XML file with its metadata. This will always contain the identification information for the image. The file name for the XML file is the full filename with extension plus “.xml”. For example: the image file “DSC_1234.jpg” will have a xml called “DSC_1234.jpg.xml”. The image set will also have an XML metadata file, the will a file name of the main image filename without the extension. For example: the image “DSC_1234” will have a xml file called “DSC_1234.xml”. These xml file will be stored in the day folder in which the image resides. Each day folder will contain a folder called “.metadata” the dot at the start makes the folder hidden. Under this folder will be another folder called xml under this folder the xml files will be stored.

CSV files CSV file are a comma separated value (CSV) files. Also the format may be called character separated values, because the separator character does not have to be a comma, this is the case of SIA. The files stored tabular date in plain text form. CSV file can be imported into both spread sheet applications such as Excel and databases such as Access. This includes importantly SQLite; CSV files can be used to provide a backup for SQLite. The CSV Database The CSV file database, like the XML database is a collection of plain files. In this case CVS Formatted files. Unlike XML there will be one file per set of Metadata attributes for the day set of images. Each image attributes will be contained in one line in the CSV file. The sets of attributes will be connected by a sequence number which is the first field in each row. If a set of attributes are not available then they an entry with the sequence identification is added but the value can be blank. So in each day folder will contain a CSV folder under the metadata directory. This CSV folder will contain the following data sets: 1. File Properties 2. Asset Properties 3. Camera Information 4. Copyright Properties 5. GPS Properties 6. Media Properties Each data set will be contained in one CSV file. The data sets contained in the CSV files will reflect the same data held in the SQLite tables.

SQLite database SQLite is an open source database that is used in both large and small systems. Adobe uses it for Lightroom. Lightroom is Adobe’s archiving application. Another user of SQLite is the programing language Python. A SQLite database can be easily access using python. As this database is popular and completely free unlike a number of other databases it has a large following. The main advantage of a database is that the data it holds can be searched and sorted much quicker than a flat file database. Data can be queried using SQL and applications can be made to use the data quickly. The disadvantages is that the database system are more complex to setup. Damage to a database affects virtually all the systems using it. Sequence Identifiers Each image in the archive is uniquely identified by a sequence number. This is then used to cross reference images within the databases. The databases generate this number to two ways, the SQLite database will generate this number as a unique number primary number key in the Asset Properties table. All other tables in the database will then contain this number as their primary key. Each image in the database must be referenced in the Asset Properties table all other tables it can be optional. The Asset Properties table will contain the full path to the image in the archive and an index that performs the reverse in that given an image path it will return the sequence number. In the case of the SQLite database, the database can generate this unique sequence number and carry out the indexing into the other tables and maintain a link to the actual image in the archive. Flat File database such as the XML and CVS databases cannot do this directly. The reason being is that that there is a set of CSV files per day and the sequence numbers are generated at the time the image is placed in the archive not the date the image was take. The sequence numbers are not guaranteed to be in any order. To solve this problem SIA maintains a file based sequence number lookup. Given a sequence number the lookup will return the full archive path. To carry out the reverse, you will start with the archive so the folder the correct CSV is known the Asset Properties CSV file will be ion image file name order so finding the sequence file number is trivial. Archive integrity One main function of an image archive is to safe guard the images within it. The archive can be damaged either intentionally or unintentionally at any time. If damage is done to the archive, the first thing for the archive to do is to inform you, the user, as soon as possible that the damage has taken place. The next thing is to inform you what damage has been done, then lastly help you fix the damage. SIA has mechanisms to monitor the integrity of the archive by recording the times that images are modified. In addition maintains a file map of the archive with both a CRC and MD5 checksums of each file in the archive. If the file map of the archive does not match the contents of the archive then these differences can be listed. Sometime these differences are relatively harmless, such as an image being modified without being marked as checked-out; on the other hand whole years’ worth of images may be missing. The file map will highlight this change. From the users point of view missing a year may not be seen until images from that year are needed, along period time may have passed before the damage may be apparent. Once the damage is identified a file list of damaged or missing files can be generated and the archive can be repaired from an archive mirror by copying the file back into the archive. A full integrity check can then be made of archive to verify that the repair was successful. Hook scripts A hook script is a program triggered by an archive repository event , such as an image being about to be processed to be put into the archive. This is for example a point where if the image say a RAW type then a picture type may be generated so both can be archived as a RAW/Picture pair. Views

Image Storage

The image machine uses the date as a primary index for storing images in the archive. The archive uses folders to partition the stored images into manageable subsets. These subsets are partitioned into years. All images using a date that fall in a particular year will be stored under folder of that year. Within a year folder the year is partitioned further into days, as there are 365 maximum days in the year then this is a manageable number of partitions within a year’s main partition. So for any image with an archive date, can be found a year / day folder in which to place it. This also is a useful scheme in that most photographers will have a rough idea when an image was taken, so to then find images in the archive folders is not too difficult. The folders are named in a simple but consistent way. The year folder is simply the year, i.e. “2014”. Then the day folder is created using the year, month and day in the form , for example: “2014_08_12” will be 12 August 2014. Having the sequence year, month and day allows the folders to be automatically sorted in date order when viewed using a file browser such as File Explorer on windows or finder on the MAC. Archive Date The SIA as explanted above the archive uses a date to place images. The most significant dates are: 1. The date the image was captured in the camera 2. The file date on the image file 3. The archived date The date the image was captured in the camera Out of the three dates the capture date is probably the most useful as it places the image in the archive at the time it was taken. This date may not be found as when the images file is taken from the camera a new create date in the image of the date the image was moved from the camera to the computer storing the image. However where is a way of recovering the capture date but this will only be present in some images that have EXIF data? EXIF data is used by the camera manufactures to store information about an image in the image. One of the values stored is the capture date. By reading this date from the image file then the capture date can be recovered. This relies on the EXIF data being present if not then the capture date is lost. The file date on the image file The file image creation date will always be present as it is a property of the image file that always must exist. This date may be a large period away from the capture date as this is the date you transfer your captured image from the camera. If you do this frequently then the then the date will not be too far from the capture date. The archived date The date on which the image was place in the archive is the last significant date. Is date can be even further away in the future from the capture date? However if you add images into the archive at the same time as transferring from the camera then again you will not be too far out from the capture time. Default archive date By default SIA will try to find the earliest date or a date that is most likely to be the capture date. This is carried out by the following. 1. Read EXIF information from all images in the set (in the case of RAW and JPG pairs). 2. Find the create date from all image files. 3. Use todays date. Reading EXIF information SIA will read EXIF information in two ways: Using an Internal EXIF reader and (if configured) using an external reader. SIA will try both methods; the internal reader will be tried first then the external reader. All images in the set will be read for any EXIF and the results stored. This EXIF data will also be used to populate the metadata information for each image and set metadata. This is metadata that is shared across all images in the set and crucially the shared archive date. If one of the images is missing data, by default the information will be filled in from other images if available. By default in an image set with both a RAW and JPG images the information in the RAW will be the primary information source. However the JPG will most likely contain EXIF data. Find the create date from all image files If the capture date cannot be found then the create date will need to be used. In a RAW/JPG image pair, the RAW file create-time will be used. Use todays date. As a fall back and no other date can be found todays date will be used. This is normally a last resort. Date checks SIA will do a sanity test on the dates. if the dates do not make sense then the image will not be added to the archive and an error raised to flag there is a possible error in the dates. The Capture dates, the image file dates and todays date must be each be in the future I.e. the capture date must be the earliest date, the image file must be the same or later that the capture date and finally todays date must be the same or later than the image file date. Anomalies in the dates can be due to the time and date being incorrectly set on the camera or archive computer. The default archive date can be overridden in these cases. One common problem is that the camera date may not be set from the manufacture’s default. Specifying and controlling the archive date. SIA has a number of options to specify and control the archive date. These can be used to override the default archive date selection. These are as follows: Use date options: These are used to override the archive date being selected. These are used if the default dates are somehow incorrect and a new date needs to be used. These options are as follows: 1. Use todays date 2. Use file date 3. Use date Use todays date This option forces the archive date to be todays date. All other dates are ignored. –use_date_today

Use file date Use file date forces the archive data to be the image file date. In the case of a RAW/JPG pair then the RAW image file date will be used. –use_file_date Use date This option forces the archive date to be the date specified in the date argument. This can be used in the case were the capture date in the camera may be incorrect and the correct date to be explicitly specified. –use_date= For example: –use_date=2014.07.12 The above will set the archive date to the 12 day of July 2014. Image Extensions In the SIA configuration folder is the Image extension file. This file contains the file extensions that SIA uses to identify the media encoding type and determine which category type an image fits. There are three categories; 1. Displayable images (Picture types). 2. Raw image (RAW Types). 3. Images that may be displayable, but will not be raw images. These may have been generated by a camera or some third party image editor. Most image editors will save the edited image with a known extension. For example Photoshop uses the extension .psd” (Image types). Some images may fit in more than one category. The most often images the a camera generates are “JPG” files these will normally have the extension “.jpg” the other most often generated files that a camera generates are RAW images these are more tricky. Each camera manufacturer uses their own defined file extension to denote RAW image, in some cases more than one extension to denote RAW images For example Nikon uses “.NEF” to denote the file is a raw image file, Canon on the other hand use “*CRF” to denote a raw image file. The image extensions file is split into three sections, one for each category type. An example file is shown below. <!– Raw image (RAW Types)– > Dng=raw,Adobe Digital Negative NEF=raw,Nikon RAW CRF=raw,Canon RAW

Bmp=img,Bitmap Jpg=img,Joint Photograohic Experts Group Gif=pic,Graphics Interchange Format Jpg=img,Joint Photograohic Experts Group

As new extensions are created by camera manufactures and photo editor authors, then this file may be update to reflect the new changes. Managed and un-managed images SIA provides managed and un-manage versions of each image in the archive. Images for each type are stored to different separate areas. The managed image is stored within the archive. the un-managed images are stored in an area that is easily accessible to the user. When a new image is added to the archive it becomes managed. SIA will assign a unique number to the image and tracked within the archive and is normally not viewable by anyone. At the same time the image is added a viewable/editable version is also added in the un-managed area. The user of the archive is able to view and edit the un-managed images even delete them with no impact on the archive. This un-managed images can be refreshed from the managed images at any time. The managed version on the other hand should never be accessed by the user and must never be modified. If an un-managed image is edited and the changes need to be archived, then the images can be checked-in to the archive. When the image is checked-in a copy of the unmanaged image is placed in the managed area of the archive with a version number as part of the file name. This will prevent the original version of the image not to be over written and identify the new version. A unique number is then added and the new image is tracked within the archive. Backing up and Mirroring When a managed image is added to the archive all mirrors automatically updated with the new image. This ensures that more than one copy of the managed image exists in almost real-time. When backing-up to off-line media such as Blu-Ray as each unique image is added to the archive

Database support for metadata in SIA Once the metadata for each image has been captured some means of storing and accessing that data needs to be found. The metadata lends itself to a number of methods for both accessing and the storage of the metadata information. SIA supports three types of storage: 1. XML files 2. CSV files 3. SQLite database Each has advantages and disadvantages. Using the three methods can help to mitigate some of the disadvantages and support other storage methods and access methods. Each will be described in detail in the following sections. XML files XML stands for Extensible Mark-up Language (XML). This is a mark-up language that defines a set of rules for encoding documents in a format that is both human readable and machine readable i.e. readable by a computer. The standard for XML is defined in the XML 1.0 Specification produced by the World Wide Web Consortium (W3C). This format can be easily converted into HTML web pages. This conversion is common, so common that tools such as XSL parses have been developed to make the process easier. There disadvantage is that they are slow to search as each file will need to be opened, read and closed. An archive will have a large number of files to read thus a time consuming process compared to accessing a database to carry out a similar search. The XML Database This essentially is a collection of XML files placed in folders in a consistent way. This allows the XML files to be accessed effectively. Any software tool using the XML files will be able to do so by following the access rules. These rules are as follows: Each image has a XML file with its metadata. This will always contain the identification information for the image. The file name for the XML file is the full filename with extension plus “.xml”. For example: the image file “DSC_1234.jpg” will have a xml called “DSC_1234.jpg.xml”. The image set will also have an XML metadata file, the will a file name of the main image filename without the extension. For example: the image “DSC_1234” will have a xml file called “DSC_1234.xml”. These xml file will be stored in the day folder in which the image resides. Each day folder will contain a folder called “.metadata” the dot at the start makes the folder hidden. Under this folder will be another folder called xml under this folder the xml files will be stored.

CSV files CSV file are a comma separated value (CSV) files. Also the format may be called character separated values, because the separator character does not have to be a comma, this is the case of SIA. The files stored tabular date in plain text form. CSV file can be imported into both spread sheet applications such as Excel and databases such as Access. This includes importantly SQLite; CSV files can be used to provide a backup for SQLite. The CSV Database The CSV file database, like the XML database is a collection of plain files. In this case CVS Formatted files. Unlike XML there will be one file per set of Metadata attributes for the day set of images. Each image attributes will be contained in one line in the CSV file. The sets of attributes will be connected by a sequence number which is the first field in each row. If a set of attributes are not available then they an entry with the sequence identification is added but the value can be blank. So in each day folder will contain a CSV folder under the metadata directory. This CSV folder will contain the following data sets: 1. File Properties 2. Asset Properties 3. Camera Information 4. Copyright Properties 5. GPS Properties 6. Media Properties Each data set will be contained in one CSV file. The data sets contained in the CSV files will reflect the same data held in the SQLite tables.

SQLite database SQLite is an open source database that is used in both large and small systems. Adobe uses it for Lightroom. Lightroom is Adobe’s archiving application. Another user of SQLite is the programing language Python. A SQLite database can be easily access using python. As this database is popular and completely free unlike a number of other databases it has a large following. The main advantage of a database is that the data it holds can be searched and sorted much quicker than a flat file database. Data can be queried using SQL and applications can be made to use the data quickly. The disadvantages is that the database system are more complex to setup. Damage to a database affects virtually all the systems using it. Sequence Identifiers Each image in the archive is uniquely identified by a sequence number. This is then used to cross reference images within the databases. The databases generate this number to two ways, the SQLite database will generate this number as a unique number primary number key in the Asset Properties table. All other tables in the database will then contain this number as their primary key. Each image in the database must be referenced in the Asset Properties table all other tables it can be optional. The Asset Properties table will contain the full path to the image in the archive and an index that performs the reverse in that given an image path it will return the sequence number. In the case of the SQLite database, the database can generate this unique sequence number and carry out the indexing into the other tables and maintain a link to the actual image in the archive. Flat File database such as the XML and CVS databases cannot do this directly. The reason being is that that there is a set of CSV files per day and the sequence numbers are generated at the time the image is placed in the archive not the date the image was take. The sequence numbers are not guaranteed to be in any order. To solve this problem SIA maintains a file based sequence number lookup. Given a sequence number the lookup will return the full archive path. To carry out the reverse, you will start with the archive so the folder the correct CSV is known the Asset Properties CSV file will be ion image file name order so finding the sequence file number is trivial. Archive integrity One main function of an image archive is to safe guard the images within it. The archive can be damaged either intentionally or unintentionally at any time. If damage is done to the archive, the first thing for the archive to do is to inform you, the user, as soon as possible that the damage has taken place. The next thing is to inform you what damage has been done, then lastly help you fix the damage. SIA has mechanisms to monitor the integrity of the archive by recording the times that images are modified. In addition maintains a file map of the archive with both a CRC and MD5 checksums of each file in the archive. If the file map of the archive does not match the contents of the archive then these differences can be listed. Sometime these differences are relatively harmless, such as an image being modified without being marked as checked-out; on the other hand whole years’ worth of images may be missing. The file map will highlight this change. From the users point of view missing a year may not be seen until images from that year are needed, along period time may have passed before the damage may be apparent. Once the damage is identified a file list of damaged or missing files can be generated and the archive can be repaired from an archive mirror by copying the file back into the archive. A full integrity check can then be made of archive to verify that the repair was successful. Hook scripts A hook script is a program triggered by an archive repository event , such as an image being about to be processed to be put into the archive. This is for example a point where if the image say a RAW type then a picture type may be generated so both can be archived as a RAW/Picture pair. Views