Releases: Georgetown-University-Libraries/File-Analyzer
Integrate Marc4j, Build APTrust Bag, Simplify classpath entries
Marc4j Integration
With the creation of a Maven-compatible version of Marc4j (https://github.com/ksclarke/freelib-marc4j), several Marc-related tasks have been integrated into the FileAnalyzer. These tasks formerly resided in https://github.com/Georgetown-University-Libraries/Marc-File-Analyzer
Classpath Simplification
Use built-in Eclipse variable for Maven dependencies.
Create APTrust compliant bag files
Fix bug in Ingest Folder Creation when a zip file is generated
v3.0.4 Fix error in which zip file prevents reruning ingest build
DSpace Ingest: Create an optional "collections" file
When processing an ingest inventory, if the 5th column header is "collections" then collections files will be generated when the 5th column contains a value.
When a value is found for "collections" generate a file named "collections" within the ingest folder.
When validating ingest folders, allow for the presence of a "collection" file within an ingest folder.
Modify Multiparser Rules to allow categorization of results
Sample Parser Rule File
[COLS]
ITEM,TAG,NOTE
[PATTERNS]
# Group Marc Notes that Need Attention
[CATEGORY: ERROR: woodstock]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(woodstock).*)"$
[CATEGORY: ERROR: bindery]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(bindery).*)"$
[CATEGORY: ERROR: damaged]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(damaged?|flood).*)"$
[CATEGORY: ERROR: nostorage]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(nostorage).*)"$
[CATEGORY: ERROR: preservation]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(preservation|repair|infirmary|marked|loose|poor condition).*)"$
[CATEGORY: ERROR: may-prob]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(mayprob).*)"$
[CATEGORY: ERROR: durkin]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(durkin).*)"$
# Group marc notes that may be skippable
[CATEGORY: SKIP: initial]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>[a-zA-Z]{1,3})"$
[CATEGORY: SKIP: pamphlet]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>pamphlet.*)"$
[CATEGORY: SKIP: initial-date]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>[a-zA-Z]{1,3}\s*\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>[a-zA-Z]{1,3}\s*\d{1,2}/\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{2,4}\s*[a-zA-Z]{1,3})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{1,2}/\d{2,4}\s*[a-zA-Z]{1,3})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>\d{1,2}/\d{1,2}/\d{2,4})\s*"$
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","\s*(?<NOTE>(Mon|Tue|Wed|Thu|Fri|Sat|Sun) .*)"$
[CATEGORY: SKIP: cls-dac-ppc]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>\s*(cls nos|dac |ppc).*)"$
[CATEGORY: SKIP: hold]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>\s*(hold in circ).*)"$
[CATEGORY: SKIP: shipment]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(shipi?ment).*)"$
[CATEGORY: SKIP: cancelled]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*(cancelled).*)"$
[CATEGORY: SKIP: header]
^("?)(?<ITEM>(record_num|Short Item ID)*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*)"$
[CATEGORY: SKIP: blank]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>\s*)"$
[CATEGORY: SKIP: lv610]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>lv\s+6/10)"$
# Identify rules that may need attention
[CATEGORY: WARN: cart]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*CART.*)"$
[CATEGORY: WARN: transfer]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*transfer.*)"$
# Default rule for items not previously caught
[CATEGORY: WARN: uncategorized]
^("?)(?<ITEM>[^,]*)\1,"(?<TAG>[^,]*)","(?<NOTE>.*)"$
[CATEGORY: WARN: no-quote]
^("?)(?<ITEM>[^,]*)\1,(?<TAG>[^,]*),(?<NOTE>.*)$
Fix Money parse issue
3.0.2 fix money parse issue
Validate Property Files, Add Report Clarity to Tasks
Validate property files before launching a task
Display useful error message
Add Metadata Registry Parameter to the DSpace Ingest Folder Create and Ingest Folder Validate Tasks
Flag Metadata Fields not found in the Metadata Registry
Sample Metadata Registry Format
Sample Code to Create the JSON Dump of the Metadata Registry
https://gist.github.com/terrywbrady/f39e88b39dc1a9a05ec4
Create money object that formats money values while performing only integer arithmetic
Add filter for Marc files
Optimize merge logic
The prior version performed the merge within a table model. Perform this in memory before creating the table model. Show merge counts.
Validate Property Files Before Executing A Task
Note: This contents of this release apply to branch sd-277
Create a facility to validate properties/property files before launching a task
Add Metadata Registry Parameter to the DSpace Ingest Folder Create and Ingest Folder Validate Tasks
Flag Metadata Fields not found in the Metadata Registry
Sample Metadata Registry Format
Add Money Type to Allowable Result Columns
Sample Code to Create the JSON Dump of the Metadata Registry
Create de-dup files based on the presence of a key
Input File: foo.txt
Assuming the first instance of a key should be retained, use the following output files
- foo.dedup.txt: All records without a duplicated key + the first instance of a duplicated key
- foo.dup-drop.txt: All records with a duplicated key (excluding the first instance of that key)
Assuming that duplicate entries require review
- foo.no-dup.txt: All records without a duplicated key
- foo.all-dup.txt: All records with a duplicated key
Remove dc.date.created check from the Ingest Folder Create
This test was specific to GU's ingest standards. Moved this check into a specialized version of this rule.
Improve formatting of dc.subject.lcsh in ProQuest ETD conversions
Some codes need to generate multiple dc.subject.lcsh elements.
Change the separator character from "; " to " -- ".























