Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String.join includes an extra copy of separator at the end of the string. #593

Open
mayhewsw opened this issue Dec 13, 2017 · 3 comments
Open
Assignees

Comments

@mayhewsw
Copy link
Member

There is a .trim() call, but this only trims whitespace. If the separator is non-whitespace (e.g. underscore, dash, period, bar, etc.), then trim doesn't work.

Also, Apache Commons has a StringUtils that works very nicely. Perhaps we want to remove this in favor of that.

@mssammon
Copy link
Contributor

mssammon commented Sep 6, 2018

I agree that we should replace cogcomp's stringutils with the apache commons library. Probably a fair number of uses throughout the package, though.

@schen149
Copy link
Contributor

Currently there are some feature generators in Edison that calls the StringUtils.join() to generate feature, which includes an extra copy of dash in the feature name. Example below.

features.add(DiscreteFeature.create(i + ":" + StringUtils.join("-", strings)));

Switching to Apache Commons will change the feature names, and so break models that relies on Edison (PrepSRL, I think). What's the best thing to do here? @mssammon

@cogcomp-dev
Copy link

probably, requires retraining. Create a branch "requires_retrain" and make the change on this branch. Once implemented, create a new issue for prepsrl retraining and refer to it in the branch notes. presumably, it is sufficient to retrain prepsrl models and deploy, then update the dependencies on the branch; at that point, it can be merged. (You are not obligated to take that new prepsrl issue, but can if you want.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants