Skip to content

Commit

Permalink
Strip out all special characters from csv headers (#918)
Browse files Browse the repository at this point in the history
* strip special characters out of header names. excel likes to leave odd unicode items, including the unicode bom, laying around. This causes havic. By stopping it right from the start we should prevent saving invisible characters to raw_metadata and other places they get stuck
  • Loading branch information
orangewolf authored Feb 9, 2024
1 parent 5c8b264 commit 3571792
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion app/models/bulkrax/csv_entry.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@ def self.fields_from_data(data)
class_attribute(:csv_read_data_options, default: {})

# there's a risk that this reads the whole file into memory and could cause a memory leak
# we strip any special characters out of the headers. looking at you Excel
def self.read_data(path)
raise StandardError, 'CSV path empty' if path.blank?
options = {
headers: true,
header_converters: ->(h) { h.to_s.strip.to_sym },
header_converters: ->(h) { h.to_s.gsub(/[^\w\d\. -]+/, '').strip.to_sym },
encoding: 'utf-8'
}.merge(csv_read_data_options)

Expand Down

0 comments on commit 3571792

Please sign in to comment.