-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove rules from Slugify and move them into files #81
Conversation
There is a generate-rules.php script that reads the JSON files and updates a DefaultRuleProvider which allows Slugify to still be very fast. In addition this PR cleans up a lot of stuff, this is required to keep the code clean, maintainable, readable and testable when we add features in the future.
ping @mac2000 What do you think about these changes?? |
To me it is totally OK, just because we are not breaking interface we can made such changes and if we will not like them we always may move back to flat file. But on the other hand comments like "Do not touch - this is generated code" is something new to PHP and to me looks more like a T4 from .Net :) To be fair this PR not solving problem with slugify configuration - no matter what we will do it will not work out of the box for absolute everyone and some configuration should be applied to match concrete locale. As for looooooong rules array I do agree with you, just wondering why not just move rules to rulesets then they may be collapsed in any editor and there will be no need to add one more step with "compilation" of rules. But on the other hand having JSON is a cool thing just because we easily can provide same slugifier for front-end. BTW So applying such changes is a matter of preferences, to me, until I can just get library and use it with all defaults without worrying about internal - its all ok - we should not break that probably in first place, all internal is just all about what we like and do not like in code Regards PS: One more thing with separate rules we have one problem, for example, imagine that I going to add Ukrainian, should I just add all letters or only that is not already there? How should I know that? You may say just to add letters that is not already there - but now I do need look at many files instead of one. On the other hand I can just add whole alphabet - but then we will make unwanted work with duplicating rules. What I mean, for example cyrillic letters in all languages are transliterated same way almost always, troubles are with some specific letters but it will be bad if to contribute some new rule people must do much more work than they should. BTW rules are not changing in time, there is just additions off missing letters, and conflicting ones is a problem that is not solved still Sorry for so long message |
Yeah, we're probably not going to find a one size fits all solution, because depending on your preferred language you want to have different transliterations. I dislike the current "solution" because users have to specifically overwrite the rules, that is, they have to know these rules. With this PR you could just say "prefer Turkish rules" and you do not need know what these actually mean. Regarding the code generation part: do you think it would be better if we generate the full file, not only the rules array? Thanks for pointing |
I am totally agree with your point about benefits for end users of slugify it will be much simpler, and even more somewhere in the future slugify may look at current locale and choose right rules which may make it work out of the box for some people. About code generation - personally I am ok with that just because I do see it in other languages, but can not say anything about overall PHP folks, its something new and may be confusing for someone. If the only reason we are moving rules out of main file - to make it more readable (but more complex on the other hand) then probably it is a way. On the other hand there is other way to split rules and still make it all readable and easy to maintain with rulesets (there was commit for Esperanto if I remember correct) we can just move all rules into multidimensional array so all will work like now, and your idea with language will be implemented and there will no be need to extra compilation steps, but it is up to you to decide such thing |
why don't use yaml files instead of json you have syntax highlighting and linting and you can use comments in the files |
@OskarStark I can say same about ini files :) its just a matter of preference, someone like yaml, someone json, probably there is cases when someone prefer xml. But there is no faster/easier way than plain php array (that is why they were used from beginning) but again it is too simple so many of us prefer to overengineer things. E.g. in case of yaml we will depend on its library, but for slugify which consists of two files one of which is interface - it is probably too much :) |
Yeah, I choose JSON because it is probably the most used one and it is built into PHP. The reason why I split it into JSON files instead of keeping them in the class is that the list grows longer and longer and since we probably need to duplicate rules it would get out of hand. True, you can collapse in an editor, but there are also Diffs and it's getting hard to find the right place to put them. |
fine for me 👍 |
Just started reviewing :) |
Remove rules from Slugify and move them into files
There are two reasons for removing the rules from the Slugify class. First of all, as more and more rules are added (which is great, because it makes the library better) the class gets longer and confusing and harder to maintain and extend. The second big reason is that the current system was not very flexible. We have a lot of problems with conflicting rules in different rules (e.g. #56 or #80) and we want to provide an easy way to change the default rules (or language).
There is a generate-rules.php script that reads the JSON files and updates a DefaultRuleProvider which allows Slugify to still be very fast.
In addition this PR cleans up a lot of stuff, this is required to keep the code clean, maintainable, readable and testable when we add features in the future.
The regular expression has been moved into the options array and a new parameter was added to the constructor called
provider
. A provider is a class that implementsRuleProviderInterface
, which defines agetRules()
method. The method expects aruleset
parameter and the provider should return the rules for the given ruleset. If no provider is given than an instance ofDefaultRuleProvider
is created. This class is generated based on the content of the JSON files inResources/rules
, however, the provider just provides the rules, they are not activated by default.Rulesets (or languages) are activated based on the new
rulesets
option. Every ruleset mentioned in this array is activated. The order of the array is important, rules in sets mentioned later can overwrite earlier rulesets. By default the following rulesets are activated:If you need Turkish (
Ä
→A
) transliterations instead of German (Ä
→AE
) you have two possibilities:A) Pass the
rulesets
option to the constructorOr B) you can activate the Turkish ruleset after initialisation.
This would solve the problems described in #56 and #80 and allows for additional, language specific rules. For example, I added the ruleset austrian which replaces
ß
withsz
instead ofss
(in German).I would appreciate comments and feedback for this PR very much, since a lot of stuff has been changed. We also need to go through the rules currently in default and move them into rulesets for the specific language (e.g., french, or spanish are missing). I am aware that some rules would be duplicated through this, but I think this situation is better than it was in the past.