Skip to content

Releases: Georgetown-University-Libraries/File-Analyzer

Integrate Marc4j, Build APTrust Bag, Simplify classpath entries

16 Jul 17:15

Choose a tag to compare

Marc4j Integration

With the creation of a Maven-compatible version of Marc4j (https://github.com/ksclarke/freelib-marc4j), several Marc-related tasks have been integrated into the FileAnalyzer. These tasks formerly resided in https://github.com/Georgetown-University-Libraries/Marc-File-Analyzer

image

image

Classpath Simplification

Use built-in Eclipse variable for Maven dependencies.

Create APTrust compliant bag files

image

image

image

Fix bug in Ingest Folder Creation when a zip file is generated

08 May 22:41

Choose a tag to compare

v3.0.4

Fix error in which zip file prevents reruning ingest build

DSpace Ingest: Create an optional "collections" file

28 Mar 00:59

Choose a tag to compare

When processing an ingest inventory, if the 5th column header is "collections" then collections files will be generated when the 5th column contains a value.

image

When a value is found for "collections" generate a file named "collections" within the ingest folder.

image

When validating ingest folders, allow for the presence of a "collection" file within an ingest folder.

image

Modify Multiparser Rules to allow categorization of results

19 Mar 23:00

Choose a tag to compare

image

image

Sample Parser Rule File

[COLS]
ITEM,TAG,NOTE

[PATTERNS]
# Group Marc Notes that Need Attention

[CATEGORY: ERROR: woodstock]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(woodstock).*)"$

[CATEGORY: ERROR: bindery]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(bindery).*)"$

[CATEGORY: ERROR: damaged]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(damaged?|flood).*)"$

[CATEGORY: ERROR: nostorage]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(nostorage).*)"$

[CATEGORY: ERROR: preservation]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(preservation|repair|infirmary|marked|loose|poor condition).*)"$

[CATEGORY: ERROR: may-prob]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(mayprob).*)"$

[CATEGORY: ERROR: durkin]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(durkin).*)"$

# Group marc notes that may be skippable

[CATEGORY: SKIP: initial]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>[a-zA-Z]{1,3})"$

[CATEGORY: SKIP: pamphlet]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>pamphlet.*)"$

[CATEGORY: SKIP: initial-date]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>[a-zA-Z]{1,3}\s*\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>[a-zA-Z]{1,3}\s*\d{1,2}/\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{2,4}\s*[a-zA-Z]{1,3})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{1,2}/\d{2,4}\s*[a-zA-Z]{1,3})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>(Mon|Tue|Wed|Thu|Fri|Sat|Sun) .*)"$

[CATEGORY: SKIP: cls-dac-ppc]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>\s*(cls nos|dac |ppc).*)"$

[CATEGORY: SKIP: hold]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>\s*(hold in circ).*)"$

[CATEGORY: SKIP: shipment]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(shipi?ment).*)"$

[CATEGORY: SKIP: cancelled]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(cancelled).*)"$

[CATEGORY: SKIP: header]
^("?)(?<ITEM>(record_num|Short Item ID)*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*)"$

[CATEGORY: SKIP: blank]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>\s*)"$

[CATEGORY: SKIP: lv610]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>lv\s+6/10)"$

# Identify rules that may need attention

[CATEGORY: WARN: cart]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*CART.*)"$

[CATEGORY: WARN: transfer]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*transfer.*)"$

# Default rule for items not previously caught

[CATEGORY: WARN: uncategorized]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*)"$

[CATEGORY: WARN: no-quote]
^("?)(?<ITEM>[^,]*)\1,(?<TAG>[^,]*),(?<NOTE>.*)$

Fix Money parse issue

13 Mar 18:39

Choose a tag to compare

3.0.2

fix money parse issue

Validate Property Files, Add Report Clarity to Tasks

11 Mar 00:10

Choose a tag to compare

Validate property files before launching a task

image

Display useful error message

image

Add Metadata Registry Parameter to the DSpace Ingest Folder Create and Ingest Folder Validate Tasks

image

image

Flag Metadata Fields not found in the Metadata Registry

image

image

Sample Metadata Registry Format

image

Sample Code to Create the JSON Dump of the Metadata Registry

https://gist.github.com/terrywbrady/f39e88b39dc1a9a05ec4

Create money object that formats money values while performing only integer arithmetic

image

Add filter for Marc files

image

Optimize merge logic

The prior version performed the merge within a table model. Perform this in memory before creating the table model. Show merge counts.

image

Validate Property Files Before Executing A Task

26 Feb 18:34

Choose a tag to compare

Note: This contents of this release apply to branch sd-277

Create a facility to validate properties/property files before launching a task

image

Add Metadata Registry Parameter to the DSpace Ingest Folder Create and Ingest Folder Validate Tasks

image

image

Flag Metadata Fields not found in the Metadata Registry

image

image

Sample Metadata Registry Format

image

Add Money Type to Allowable Result Columns

image

Sample Code to Create the JSON Dump of the Metadata Registry

https://gist.github.com/terrywbrady/f39e88b39dc1a9a05ec4

Create de-dup files based on the presence of a key

05 Feb 20:32

Choose a tag to compare

image

Input File: foo.txt

count key dedup

Assuming the first instance of a key should be retained, use the following output files

  • foo.dedup.txt: All records without a duplicated key + the first instance of a duplicated key
  • foo.dup-drop.txt: All records with a duplicated key (excluding the first instance of that key)

Assuming that duplicate entries require review

  • foo.no-dup.txt: All records without a duplicated key
  • foo.all-dup.txt: All records with a duplicated key

Remove dc.date.created check from the Ingest Folder Create

27 Jan 19:41

Choose a tag to compare

This test was specific to GU's ingest standards. Moved this check into a specialized version of this rule.

Improve formatting of dc.subject.lcsh in ProQuest ETD conversions

15 Jan 18:29

Choose a tag to compare

Some codes need to generate multiple dc.subject.lcsh elements.

Change the separator character from "; " to " -- ".