Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex capturing not working as one would expect (Failed to compose renaming scheme) #668

Open
gr4nt3d opened this issue Feb 27, 2024 · 0 comments

Comments

@gr4nt3d
Copy link

gr4nt3d commented Feb 27, 2024

I wanted to rename files according to the following scheme:

"original data" "more data" > rename
first middle surname of authors title > surname_f(m)_title
Author McAuthentic How to waste time with regex > mcauthentic_a_how_to_waste... (depends on cutoff)
Adam Adams; Bob Bobsen Our-Very-Important Book > adams_a_bobsen_b_our_very_important_book
... ... ... the middle names are optional and I already gave up on

I tried to get information from the website, but the description on how users can define wildcards did not cover all necessary information. Googling and a code search was unsuccessful. ChatGPT helped a little but did not get the problem. My regex should do the job according to regex101.com -- compare below -- but I am too inexperienced to rule out a simple error on my behalf. Anyways, Zotero / ZotFile does not accept it and just gives me the full authorLastG that I used, or doesn't capture all necessary info.

Zotero-Version: 6.0.33 (homebrew)
ZotFile-Version: 5.1.2
OS: macOS

Ideal fix: Please provide other means to make such formats easily achievable.


My test-case for the regex capturing:

I am aware that the test-case does not suffice but should still represent a relevant MWE. If it works a sequence of regex's, append's and finally replace and toLower would be enough (hence it must work in sequence). In the end the information from authorLastG (eg. _surname,_firstname_surname2,_firstname2_m.) should be extracted and the format would look something like this: {%1_}{%t}

My regex:

([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?(?:([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?)?

ChatGPTs try:

([A-Za-z]+)\s*,\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?(?:\s*,\s*([A-Za-z]+)\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?)?

Test:

{
  "2": {
      "field": "authorLastG",
      "operations": [
          {
              "function": "exec",
              "regex": "([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?(?:([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?)?",
              "group": 2
          },
          {
              "function": "toLowerCase"
          }
      ]
  }
}
Some authorLastG test-cases for regex101:
brunton,_steven_l._kutz,_j._nathan_
bishop,_christopher_m._
sutton,_richard_s._barto,_andrew_g._
cai,_qingpeng_et_al_
konda,_vijay_tsitsiklis,_john_
konda,_vijay_tsitsiklis,_john
li,_yuxi_
lillicrap,_timothy_p._et_al_
mcauthentic,__
tsitsiklis,_john_n_van_roy,_benjamin_
sutton,_richard_s._et_al
sutton,_richard_s._barto,_andrew_g.
mitchell,_thomas
szepesvari,_c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant