Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt gambdoc to use the new TSV index #698

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lassik
Copy link
Contributor

@lassik lassik commented May 27, 2021

This patch is work-in-progress and is done on top of PR #697.

Adapts gambdoc.unix.in and the help procedures in lib/_repl.scm to work with the TSV index.

This doesn't properly handle the case where gambdoc is asked to look for remote documentation on gambitscheme.org; there's enough work as it is that I thought I'd ask for a review before doing that.

@lassik lassik changed the title Done Adapt gambdoc to use the new TSV index May 27, 2021
bin/gambdoc.unix.in Outdated Show resolved Hide resolved
lib/_repl.scm Outdated Show resolved Hide resolved
lib/_repl.scm Outdated Show resolved Hide resolved
lib/_repl.scm Outdated Show resolved Hide resolved
lib/_repl.scm Outdated Show resolved Hide resolved
bin/gambdoc.unix.in Outdated Show resolved Hide resolved
@lassik
Copy link
Contributor Author

lassik commented Oct 11, 2021

@feeley All of the above should be fixed now. Can you review?

One nit is that (##raise-error-exception "no help found for" entry) does not show the entry part when the exception occurs in the REPL, so you don't see which lookup failed.

@feeley
Copy link
Member

feeley commented Oct 11, 2021

It should be (##raise-error-exception "no help found for" (##list entry)).

@lassik
Copy link
Contributor Author

lassik commented Oct 11, 2021

OK, that works:

> (help truncate/)
*** ERROR IN (stdin)@1.1 -- no help found for "truncate/"

@lassik
Copy link
Contributor Author

lassik commented Oct 11, 2021

How do we change gambdoc.bat.windows.in?

@lassik
Copy link
Contributor Author

lassik commented Oct 12, 2021

@feeley Please advise on gambdoc.bat.windows.in so we can get this done in time for release.

{
entry="$GAMBDOC_ARG2_PARAM"
index="$GAMBDOC_ARG3_PARAM"
pattern="$(echo ";$index;$entry" | tr ';' '\t' | tr -d '\n')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that this depends on so many things... tr, grep, head, cut... It will be hard to port to another OS (e.g. Windows).

Here's a wild idea (which I think you've had previously)... how about doing that processing in Scheme using the Gambit interpreter, which we know is installed? The only part of the script that is difficult is launching the browser (because the Gambit runtime system may not have a working implementation of open-process).

However there is (##os-shell-command cmd) which uses the C system function in the usual C-based runtime system. See lib/gambit/process/process.scm for details. This can be relied on to work on all operating systems with a shell (even Gambit compiled to Python and JavaScript, but only on top of nodejs, have an implementation of ##os-shell-command).

Come to think of it I may be overly paranoid because gambdoc.bat is itself launched using open-process.

And at that point maybe the whole logic should be encapsulated in a Scheme module. I'm not sure how the "find browser" part would be implemented (and portably). There's still an argument to be made to launch a subprocess so that it doesn't interfere with the running Scheme process too much. On the other hand I think it is more valuable to do everything "in process" to minimize external dependencies (in particular it would be easy to get working with the web REPL that runs in an environment where subprocesses are not available, but opening a new web page is easy).

Would you like to prototype this? Perhaps a Scheme module called _doc or _help that is builtin. The .tsv file could have been preprocessed to be an s-expression. It is best to read the file every time help is called (i.e. not cache the database) so that the heap doesn't bloat permanently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to prototype this?

Yes.

Perhaps a Scheme module called _doc or _help that is builtin.

How about (gambit help)?

The .tsv file could have been preprocessed to be an s-expression.

I chose the .tsv format because it's standard (emitted by makeinfo) while still being very simple. All you need to get the data as Scheme vectors is read-line and then string-split on tabs, 5-10 lines of code altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: string-split is in SRFI 13 which we'll be implementing anyway, and read-line is in R7RS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, string-split is not actually in SRFI 13. It's in two of the newer string SRFIs. Might be good to add to (gambit string) in any case.

Here's a complete parser:

(define (string-split string char)
  (let loop ((a 0) (b 0) (fields '()))
    (cond ((= b (string-length string))
           (reverse (cons (substring string a b) fields)))
          ((char=? char (string-ref string b))
           (loop (+ b 1) (+ b 1) (cons (substring string a b) fields)))
          (else
           (loop a (+ b 1) fields)))))

(define (parse-tsv)
  (let loop ((entries '()))
    (let ((line (read-line)))
      (if (eof-object? line)
          (reverse entries)
          (let ((fields (string-split line #\tab)))
            (loop (cons (list->vector fields) entries)))))))

(define (writeln x) (write x) (newline))
(for-each writeln (with-input-from-file "gambit.tsv" parse-tsv))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard way of doing this in Gambit is

(define (string-split str ch)
  (call-with-input-string
    str
    (lambda (port)
      (read-all port (lambda (port) (read-line port ch))))))

With SRFI 13 it is probably faster to do (untested code):

(define (string-split str ch)
  (let loop ((i (string-length str)) (fields '()))
    (let ((j (string-index-right str ch 0 i)))
      (if j
          (loop j (cons (substring str (+ j 1) i) fields))
          (cons (substring str 0 i) fields)))))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work on adding an implementation of string-split.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The following each define string-split, seemingly with mutually compatible signatures:

  • SRFI 130: Cursor-based string library
  • SRFI 140: Immutable Strings
  • SRFI 152: String Library (reduced)

@feeley
Copy link
Member

feeley commented Oct 14, 2021

By the way, the Gambit web REPL uses this code to implement help... see contrib/try/VM.scm :

;; Redefine the documentation browser to open the manual.

(##gambdoc-set!
 (lambda (arg1 arg2 arg3 arg4)
   (##inline-host-statement
    "window.open(@scm2host@(@1@));"
    (##string-append "doc/gambit.html#" arg4))))

So perhaps a "hook" is needed in the _doc module to install the logic for opening a web page. Alternatively, this could be in the _doc module in a (cond-expand ((compilation-target js) ...)). But there needs to be a dynamic check to distinguish JavaScript code running in nodejs (where ##os-shell-command is the right way to open a web page) and directly running in a web page (where the call to window.open(...) is the right way).

@lassik
Copy link
Contributor Author

lassik commented Oct 14, 2021

So perhaps a "hook" is needed in the _doc module to install the logic for opening a web page.

Good idea.

@feeley
Copy link
Member

feeley commented Oct 14, 2021

The SRFI 152 string-split is needlessly complex so I have added ##string-split-at-char and ##string-split-at-char-reversed that have a different interface (somewhat simpler and specialized for a character delimiter). The SRFI 152 string-split can detect the case where the delimiter is a single character string and call ##string-split-at-char in that case. The handling of the "grammar" parameter can be implemented as a pre/post-processing.

@feeley
Copy link
Member

feeley commented Oct 15, 2021

I have experimented with 2 implementations of a lookup procedure to find a key in the gambit.tsv file. The first reads the whole .tsv file and splits each line around tabs. The second only reads as much lines as necessary and uses ##string-suffix=? to match the key. The second is about 4x faster and uses 6x less memory so it should be used.

(define path "doc/gambit.tsv")

(define (lookup1 path key)
  (let ((all
         (map (lambda (line)
                (##string-split-at-char line #\tab))
              (call-with-input-file
                  path
                (lambda (port)
                  (read-all port read-line))))))
    (let loop ((lst all))
      (if (pair? lst)
          (let ((x (car lst)))
            (if (string=? (caddr x) key)
                x
                (loop (cdr lst))))
          #f))))

(define (lookup2 path key)
  (let ((k (string-append "\t" key)))
    (call-with-input-file
        path
      (lambda (port)
        (let loop ()
          (let ((line (read-line port)))
            (if (string? line)
                (if (##string-suffix=? line k)
                    (##string-split-at-char line #\tab)
                    (loop))
                #f)))))))

(pp (time (lookup1 path "with-output-to-file"))) ;; 0.0035 secs, 3.3 MB alloc

(pp (time (lookup2 path "with-output-to-file"))) ;; 0.0009 secs, 0.5 MB alloc

@lassik
Copy link
Contributor Author

lassik commented Oct 16, 2021

What about contrib/GambitREPL/help.scm - should it be merged into the new module?

@feeley
Copy link
Member

feeley commented Oct 16, 2021

The file contrib/GambitREPL/help.scm is specific to the GambitREPL iOS app. But it is a good example of a situation where it would be good to override the "open hyperlink" functionality that is provided by (gambit help). In the GambitREPL case, the procedure (show-help-document docu anchor) defined in contrib/GambitREPL/program.scm.

@lassik
Copy link
Contributor Author

lassik commented Oct 16, 2021

There should probably be a browse-url library (like in Emacs), separate from the help/doc library.

@lassik
Copy link
Contributor Author

lassik commented Oct 16, 2021

(browse-url "http://foo/bar") which tries a list of browsers, and the list is a parameter object which the user can override. Could even be a SRFI.

@feeley
Copy link
Member

feeley commented Oct 16, 2021

A browse-url library would be nice. However I'm not convinced a parameter object for the list of browsers is the right way to go... I'll have to think about it.

@lassik
Copy link
Contributor Author

lassik commented Oct 16, 2021

A parameter object lets the implementation provide a default list of browsers to try, which users can customize at runtime (either permanently or temporarily), and also lets libraries either force or prefer a specific browser. Seems like it would cover all use cases in a reasonably simple way.

@feeley
Copy link
Member

feeley commented Oct 16, 2021

The problem with a parameter object is that it conflates two needs. The Scheme program may be executing code that installs a new binding for the list of browsers, because that is part of the logic of the program, and then it encounters an error so a REPL is started to debug that. But if at that REPL the programmer enters (help ...) he will be using the browser that is setup for the program, not the browser that is setup by the development environment. So there has to be a way to separate both needs... perhaps two parameter objects... but this isn't 100% clear to me.

@feeley
Copy link
Member

feeley commented Oct 16, 2021

Note that there is already the parameter object help-browser which is the browser to use for debugging using the help procedure.

@lassik
Copy link
Contributor Author

lassik commented Oct 22, 2021

@feeley Here's a fledgling _doc module, all untested code. How do we add it to the build system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants