Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strong's Numbers, Strong's Plus, etc. #93

Open
jonathanrobie opened this issue Feb 20, 2018 · 15 comments
Open

Strong's Numbers, Strong's Plus, etc. #93

jonathanrobie opened this issue Feb 20, 2018 · 15 comments

Comments

@jonathanrobie
Copy link
Contributor

The original Abbott-Smith did not use Strong's numbers. This edition uses Strong's numbers and includes an extended form of Strong's numbers created by Alan Bunning. This is one of several approaches to extending Strong's numbers.

James Tauber's morphological lexicon collates the most common schemes for lemmatization here:

https://raw.githubusercontent.com/morphgnt/morphological-lexicon/master/lexemes.yaml

biblicalhumanities.org would like to see one scheme for lemmatization adopted widely, and I think James Tauber has probably done the most work on this particular issue.

@destatez
Copy link
Contributor

In that file, what does the gk field stand for, where does it get its values? It starts as matching strongs, but then diverges

@jonathanrobie
Copy link
Contributor Author

I assume these are Goodrick-Kohlenberger numbers. I'll point James at this issue too.

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Feb 20, 2018

A comment on what you said here:

#91 (comment)

If it's called a Strong's number in the markup, I think it should be a Strong's number. The world is littered with things that claim to be Strong's numbers and are not because individuals decided to "improve" the scheme and did not always even document what they were changing. That means that if you find a Strong's number in one resource and look it up in another resource, it does not work.

There are several "extended" Strong's numbers, but most of these are not designed to be extended to a larger corpus that might include, for instance, the Church Fathers. That's a problem if you want Catenae (a resource that shows where each passage is quoted and commented on in the Church Fathers) or similar resources. And for linguistic work, we also want the Papyrii and contemporary Hellenistic literature.

There is clearly a need for identifiers that cover all known lexemes, and that's what James is working on. That's a need that spans resources and projects.

@destatez
Copy link
Contributor

Check out link https://hermeneutics.stackexchange.com/questions/20024/what-do-the-goodrick-kohlenberger-numbers-represent-what-features-does-this-s . It does not look like there is a big following of this numbering scheme. Since most folk are on board with Strongs, Alan's idea may be a better option.

@destatez
Copy link
Contributor

Alan's scheme IS bridging other Greek resources. I believe that that was his maIn driver

@jag3773
Copy link
Contributor

jag3773 commented Feb 20, 2018

@jonathanrobie What exactly are you proposing? Are you suggesting that we should put the original Strong's numbers into A-S ?

The link you posted doesn't appear to provide a distinct scheme, it provides a disambiguation file that would allow you to cross reference multiple resources. Included in that would be the Strong's numbering scheme from Alan, since it doesn't break compatibility with the original Strong's numbers.

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Feb 20, 2018

@jag3773 There are multiple possible solutions, but I would like to satisfy these requirements:

  1. If A-S has numbers called Strong's numbers, they should not break compatibility.
  2. For identifiers not covered by Strong, we should use the same solution so that the things biblicalhumanities.org is working on are compatible with the things you are working on. Especially since there are some resources, like this one, that we both work on. On our side, James Tauber is the morphology lead.

These numbers actually occur inside the lexicon, and I want cross-reference to work across all source language resources.

@jonathanrobie
Copy link
Contributor Author

Since most folk are on board with Strongs, Alan's idea may be a better option.

I don't think we should use GK numbers, they are copyrighted and we cannot extend them. But currently, nobody is using Alan's system, and we have two groups that are each investing in different approaches that are not compatible. So what I want is to carefully consider how best to move forward, and I would like James, Ulrik, and Alan to be part of the conversation so that we can get the stakeholders on the same page.

I want one solution, but I'm not invested in any particular solution.

@destatez
Copy link
Contributor

I ran the following proposal by Todd and he gives it a thumbs up. Jonathan, what is your opinion on this proposal? And you, Jesse @jag3773 ?

I'm proposing that we re-write the baseline A-S xml with Strongs Plus IDs and manually update the undefined G??? IDs to what Daniel has determined, but using the Strongs Plus, not the standard Strongs followed by a letter. We can then re-baseline that for historical reasons. I can then re-run my latest script to reformat the xml with the 2 new attributes and save that back into the original filename and baseline that as the reference moving forward. A slight spin on this would be to use the attribute name "strongsplus" instead of "strong" to make sure that users of the xml understand this modified numbering scheme.

@jonathanrobie
Copy link
Contributor Author

I would like to hear from James and Ulrik and Alan before making a decision. Making the right decision is probably more important than making a quick decision, though we shouldn't dawdle either.

I see several areas of incompatibility with Strong's numbers in the enhanced numbering scheme described in Alan's project description here: https://greekcntr.org/downloads/project.pdf, I would like to think carefully about how that affects compatibility. Most resources will be indexed to traditional Strong's numbers.

Would it be helpful to use the prefix to clearly flag a number that differs from Strong's? For instance, we could use a B to indicate a Bunning number and the traditional G or H to indicate a Strong's number?

In the long run, I think we need something that can handle any Greek or Hebrew word.

@jtauber
Copy link

jtauber commented Feb 20, 2018

Ulrik and I argued in our 2006 paper[1] (and me again in my SBL 2017 talk[2]) why a single number-to-lexeme is insufficient and problematic. That is not to say we can't secondarily reference Strong's numbers (or Bunning numbers/ESN) even with their limitations, but that's just part of the story and shouldn't be the ultimate identifier. Other perfectly valid schemes to "reference" to are A-S headwords, BDAG headwords, or LSJ headwords but each by themselves has problems. My proposal has always been a system that recognises that all these resources differ in how they lump and split lexemes and to avoid the myth there is only ONE RIGHT WAY of doing this.

One thing we absolutely should avoid is conflating multiple words under a single Strong's number and still claiming they are Strongs. (This alone renders many resources that "use Strong's numbers" partially useless)

As part of the work I'm doing with the Perseus project, I'm working on this issue for the entire Perseus corpus and I'm working with others to extend it to Greek papyri as well.

To the extent that people have linked together lexemes that have been lumped by some and split my others, this information is useful (if made freely available in machine-actionable form)

Claiming ONE TRUE LUMPING/SPLITTING is a lot less helpful.

[1] https://www.academia.edu/19660777/A_New_Numbering_System_for_Greek_New_Testament_Lexemes_2006_
[2] https://vimeo.com/243936959

@destatez
Copy link
Contributor

The idea of having unique prefixes for non-standard Strongs appeals to me. From a tooling standpoint, particularly with our Greek lexicon, this has some significant tooling and configuration issues. It says that we should intelligently drop the 5-digit number back down to 4, but that number is used for folder names and is embedded in every lemma file, for its own reference and for any link to other lemma files (folders). I would want to get the Greek New Testament tooling person in the loop to determine the impact there. Even with Alan's spreadsheet being the driver for that, it should not be that difficult to reformat his 5-digit numbers to 4-digit where the one's digit is zero, and using a different prefix for cases where that is not zero.

(Can we add Todd Price as a participant and get his input on this aspect?)

@jag3773 Even with my work on the tool to create the initial Hebrew and Aramaic lexicon, what are your thoughts on this numbering scheme topic? There was some earlier discussion about the "standard" Strongs not fitting with both the Hebrew and Aramaic. Would the standard 4-digit Strongs fit for the Hebrew side, specifically? Could you use a different prefix for the Aramaic when it does fit the standard Strongs? (My exposure to Aramaic is close to nil)

@jonathanrobie
Copy link
Contributor Author

jonathanrobie commented Feb 21, 2018

To me, the most important immediate issue is that many resources use Strong's numbers, not just Alan's GNT, so we want to make sure that bog standard Strong's numbers are supported.

But over time, we also need a better way of doing lemmas in addition to this. Consider entries like <entry n="λέγω|G2036|G2046|G3004|G4483">, <entry n="εἰ|G1487|G1489|G1490|G1499|G1508|G1509|G1512|G1513">, <entry n="αὐτός|G846|G4571|G4671|G4675|G5209|G5210|G5213"> and <entry n="ἐγώ|G1473|G1691|G1698|G1700|G2248|G2249|G2254|G2257|G3165|G3427|G3450">. I'm not sure there's an easy short-term fix, but these clearly indicate that what we have is not an identifier for a lexeme.

@destatez
Copy link
Contributor

I agree with point one. That was why I was proposing that we use only standard Strongs with the G (or H) prefix and the for any of Alan's "long" Strongs we would use a 5-digit number to match his, but preface that with the letter B (Bunning)

I had forgotten about the second point. For those cases, when I was generating the ugl files, I treated them as undefined Strongs to have the ugl team determine what the correct answer was. I was assuming that there was only 1 Strongs ID for each Greek word.

@mrgreekgeek
Copy link
Contributor

@jonathanrobie was an agreement ever reached about this issue? I just did a quick search, and it looks like we've got 452 entries that are missing Strongs numbers altogether. That means we have no way to match them to a word in a Bible text with Strongs numbers. I'm presuming some of them are missing because there are "gaps" in the original Strongs numbers that need to be filled in by adopting one of the aforementioned enhancements to Strongs?

Some of the lemmata that are missing Strongs numbers are just a "redirection" to an alternative spelling, (for example: εἴωθα, v.s. ἔθω). In that case, couldn't we just copy the "canonical" lemma's Strong number to the variant spelling? I would be happy to create a PR for that if you think that would be a suitable solution. (See the 2nd summary below for a preliminary list of 158 words that would qualify for that treatment).

I'm using this Abbot-Smith data to create a little online lexicon app, and I'm pretty interested in seeing as many bugs/issues worked out as possible. Please let me know how I can help to get this data cleaned up and better usable.

Thanks!

Lemmata without Strongs numbers (452)
Ἀβειληνή
ἀγαθουργέω
ἀγγέλλω
ἄγγος
ἀγυιά
Ἀδμείν
ἀηδία
ἀθροίζω
Αἰλαμίτης
ἄκρον
ἁλιεύς
ἀλλαχοῦ
Ἄλφα
ἀμφιάζω
ἀμφιβάλλω
ἀμφιέζω
ἄμωμον
ἄν_2
ἀνακυλίω
ἀνάληψις
ἀνάπειρος
ἀναπηδάω
ἀνίλεως
ἀντίληψις
ἀνώγαιον
ἀπασπάζομαι
ἀπειθία
ἁπλόος
ἀποδεκατόω
ἀποκαθιστάνω
ἀποκατιστάνω
Ἅρ
ἀραβών
ἄραγε
ἆράγε
Ἄριος
ἄρνας
Ἀρνεί
ἄρραφος
ἄρρην
ἄρχι-
Ἀσά
ἀσαίνω
Ἀσύγκριτος
ἀτιμάω
αὔξω
αὔρα
αὐχέω
ἀφεῖδον
ἀφθορία
ἀφυστερέω
Ἄχας
Ἀχελδαμάχ
Β
βάρ
Βελίαλ
Γαλλία
γαμίζω
Δ
Δαβίδ
δανείζω
δάνειον
δανειστής
δεκαέξ
δεκαοκτώ
Δελματία
δεξιολάβος
δέον
δέος
δέρρις
δέω_2
δηλαυγῶς
Δία
διάγε
διακαθαίρω
διαλιμπάνω
διαπαρατριβή
διαφανής
διαχλευάζω
διενθυμέομαι
διερμηνεία
διόρθωσις
Δίς
δισμυριάς
δοκιμασία
δόκιμος
δράμω
δυσεντέριον
δύσις
δυσφημία
δύω
δωροφορία
Ἐ
ἔγγιστος
ἐγκακέω
ἐγκαυχάομαι
ἐθέλω
εἰδέα
εἶδον
εἵνεκεν
εἴπερ
εἶπον
εἴπως
εἴρω
εἶτεν
εἴωθα
ἐκβαίνω
ἐκζήτησις
ἐκθαυμάζω
ἐκκακέω
ἐκκόπτω
ἐκκρέμαννυμι
ἐκπερισσῶς
ἐκπηδάω
ἐκσώζω
ἔκτρομος
εκχύννω
ἐλεάω
ἔλεγχος
ἐλεινός
ἕλιγμα
Ἐλισάβετ
ἐλκύω
ἐμπαιγμονή
ἐμπεριπατέω
ἐμπνέω
ἐμπρήθω
ἔνατος
ἐνδόμησις
ἐνενήκοντα
ἐνκαίνια
ἐνκαινίζω
ἐνκακέω
ἐνκατοικέω
ἐνκαυκάομαι
ἐνκεντρίζω
ἐνκοπή
ἐνκόπτω
ἐνκρίνω
ἔνκυος
ἐννεός
ἔννυχα
ἐνορκίζω
ἐξ
ἔξειμι_2
ἐξερευνάω
ἐξέφνης
ἐξόν
ἐξουδενόω
ἐξουθενόω
ἐπάρατος
ἐπέρχομαι
ἐπιείκεια
ἐπικεφάλαιον
ἐπιλείχω
ἐπιοῦσα
ἐπισπείρω
ἐπίστασις
ἐπιτροπεύω
ἐρευνάω
ἑρμηνία
ἑσπερινός
Ἑσρών
εὐθύμως
εὐπάρεδρος
εὐποιέω
εὐροκλύδων
Εὐωδία
ἐφεῖδον
ἔφιδε
ἐφνίδιος
ζαφθανεί
ζβέννυμι
ζηλεύω
Ζμύρνα
Η
ἡμεῖς
ἦμήν
ἡσσάομαι
ἥττων
θεολὸγος
θορυβάζω
Ι
Ἱεριχώ
ἱερόθυτος
Ἰσασχάρ
Ἰτουραία
Ἰωάννης
Ἰωβήδ
Ἰωβήλ
Ἰώδα
Ἰωνάθας
Ἰωσήχ
Ἰωσίας
Κ
καθ᾽ εἶς
καθολικός
καθώσπερ
Καΐφaς
κἀμέ
κάμιλος
κἀμοι
Κανανίτης
Καπερναούμ
καταγράφω
καταδίκη
κατάθεμα
κατακληροδοτέω
κατακύπτω
καταναθεματίζω
καταυγάζω
καταφάγω
κατευλογέω
κατήγωρ
κατοικίζω
κατωτέρω
καυτηριάζω
Κεγχρεαί
κέδρος
κενεμβατεύω
κερέα
κεφαλαιόω
κημόω
κινάμωμον
Κίς
κίχρημι
Κλαῦδα
κλινάριον
Κολασσαί
κολλύριον
Κολοσσαεύς
κόπριον
κορβανᾶς
κοσμίως
κρέμαμαι
κρεπάλη
κρύβω
κρυφαῖος
κυβεία
κυκλεύω
κυλισμός
Λ
λακέω
λαμμᾶ
Λαοδικεία
λεμά
λευκοβύσσινος
λημά
λῆψις
λιμά
λίμμα
λογία
Μ
Μαγαδάν
Μαγεδών
μαγεία
μαζός
Μαθθαθίας
Μαϊνάν
μᾶλλον
μάρτυρ
μασθός
μεῖγμα
μείγνυμι
μέλει
Μελελεήλ
μενοῦν
μεσανύκτιον
μετάληψις
μετατρέπω
μήγε
μηθείς
μήπου
μογγιλάλος
μύλινος
Μύρρα
Μυτιλήνη
Ν
Νεεμάν
νεοσσός
νεώτερος
νίζω
νόος
νουμηνία
Νυμφᾶς
Ξ
Ο
ὁδοποιέω
Ὀζίας
οἰκειακός
οἰκετεία
οἰκοδόμος
οἰκουργός
οἶμαι
οἴσω
ὀλεθρεύω
ὄλεθρος
ὀλιγοπιστία
ὀλίγως
ὀμείρομαι
ὁμίχλη
ὀμόω
ὄπτω
ὀρινός
ὄρνιξ
ὅσγε
ὅσον
ὅτου
οὐθείς
οὐχ
ὀχετός
ὀψία
Π
Πάγος
παιδία
παλιγγενεσία
παμπληθεί
πανδοκεύς
πανδοκίον
πανταχῇ
παραβουλεύομαι
παρακαθέζομαι
παρεδρεύω
παρεμβάλλω
παριστάνω
πατρολῴας
πεζῇ
Πειθώ
πεῖν
πεντηκοστή
περαιτέρω
περιάπτω
περικρύβω
περιραίνω
πετάομαι
πιθός
Πιλάτος
πινακίς
Πισίδιος
πλάνης
πλησίον
ποία
πολυεύσπλαγχνος
πραγματεία
πραϋπαθία
πρεσβευτής
πρεσβύτερος
πρίω
προβάτιον
προείρηκα
προερέω
πρόϊμος
προπάτωρ
προσαίτης
προσανέχω
προσαχέω
πρόσκλησις
προσκολλάω
προσπαίω
πρώιος
πρῶτον
πρώτως
Πυρρός
πως
Ρ
ῥαχά
ῥήσσω
Ῥομφά
ῥοπή
ῥυπαίνω
ῥυπαρεύομαι
Σ
Σαλωμών
Σαμάρεια
Σαρούχ
σειρός
σηρικός
σιρός
σιτίον
Σιχάρ
σκότος_2
σπυρίς
στιβάς
Στοϊκός
συγγενής
συγγενίς
συγγνώμη
συγκ-
συγχ-
συζ-
συλλ-
συμβ-
συμμ-
συμμορφίζω
συμπ-
συμφ-
σύμφορος
συμψ-
συναλλάσσω
συνβασιλεύω
συνεπιτίθημι
συνιστάνω
συνκαθίζω׀συγκαθίζω
συνκατανεύω
συνμορφίζω
σύνοιδα
συνοράω
συνπίνω
συνπίπτω
συνσ-
συνχύννω
συστασιαστής
σφυρόν
σωτήριον
Τ
Ταβιθά
ταπεινόφρων
τετρακόσιοι
τετραρχέω
τεύχω
Τίτιος
Τρεῖς_Ταβέρναι
τρῆμα
τροφοφορέω
τυπικῶς
τυχόν
Υ
ὑπερεκπερισσοῦ
ὑπερεκπερισσῶς
ὑπερλίαν
ὑπεροράω
ὑπολαμπάς
ὑπόλειμμα
ὑπόλιμμα
ὑσσός
ὑφαίνω
Φ
φαιλόνης
φαρμακεῖα
φάρμακον
φημίζω
φίλη
φοινίκισσα
Χ
χέω
χθές
χόος
χρυσοῦς
Χωραζίν
Ψ
ψεύδομαι
ὠτίον
Lemmata with spelling variants and no Strongs numbers (158)
ἀγυιά
Αἰλαμίτης
ἁλιεύς
ἀμφιέζω
ἀνάπειρος
ἀντίληψις
ἁπλόος
ἀραβών
ἄραγε
ἆράγε
Ἄριος
ἄρνας
ἄρραφος
ἄρρην
Ἀσά
ἀσαίνω
Ἀσύγκριτος
αὔξω
ἀφεῖδον
Ἄχας
Δαβίδ
Δελματία
δύω
ἔγγιστος
ἐγκακέω
ἐγκαυχάομαι
ἐθέλω
εἶδον
εἵνεκεν
εἴπερ
εἴπως
εἴωθα
ἐκκακέω
ἐλεινός
Ἐλισάβετ
ἐλκύω
ἐμπρήθω
ἐννεός
ἔννυχα
ἐξ
ἐξέφνης
ἐξόν
ἐξουδενόω
ἐπιείκεια
ἐπιοῦσα
ἐφεῖδον
ἔφιδε
ἐφνίδιος
ἡμεῖς
ἦμήν
ἡσσάομαι
ἥττων
Ἱεριχώ
Ἰτουραία
Ἰωβήλ
Ἰωνάθας
Ἰωσίας
κάμιλος
Κανανίτης
Καπερναούμ
καταφάγω
κατωτέρω
Κεγχρεαί
κερέα
κινάμωμον
Κίς
κίχρημι
Κλαῦδα
Κολασσαί
κολλύριον
Κολοσσαεύς
κρέμαμαι
κρεπάλη
κρύβω
κυβεία
λαμμᾶ
Λαοδικεία
λεμά
λῆψις
λίμμα
μαγεία
Μαθθαθίας
Μαϊνάν
μᾶλλον
μάρτυρ
μασθός
μείγνυμι
μέλει
Μελελεήλ
μεσανύκτιον
μετάληψις
μήγε
μηθείς
μογγιλάλος
Μύρρα
Μυτιλήνη
Νεεμάν
νεοσσός
νεώτερος
νίζω
νόος
νουμηνία
Ὀζίας
οἰκειακός
οἶμαι
οἴσω
ὀλεθρεύω
ὀμόω
ὄπτω
ὀρινός
ὅσον
ὅτου
οὐθείς
οὐχ
Πάγος
παιδία
παλιγγενεσία
παμπληθεί
πανδοκεύς
πανδοκίον
παριστάνω
πεζῇ
περικρύβω
πιθός
Πιλάτος
πλησίον
πολυεύσπλαγχνος
πραγματεία
πρεσβευτής
πρεσβύτερος
πρίω
ῥαχά
ῥήσσω
Σαλωμών
Σαμάρεια
Σαρούχ
σηρικός
σιρός
Σιχάρ
σπυρίς
Στοϊκός
συγγνώμη
συνμορφίζω
συνοράω
σωτήριον
Ταβιθά
τετραρχέω
τεύχω
ὑπεροράω
ὑπόλιμμα
φαιλόνης
φαρμακεῖα
χέω
χθές
χόος
χρυσοῦς
Χωραζίν
ψεύδομαι

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants