What are the ideas for a new standard test SRFI? #3

amirouche · 2020-08-04T07:04:49Z

I read here and there many people have ideas about a new test SRFI.

What are your ideas?

ref: https://srfi.schemers.org/srfi-64/
ref: https://srfi.schemers.org/srfi-78/

amirouche · 2020-08-04T07:13:51Z

Here are some thought by mdhughes at https://mdhughes.tech/2020/02/27/scheme-test-unit/

shirok · 2020-08-04T07:20:31Z

There are two situations:

Per-project test---test frameworks each project can choose to adopt and write tests in. Such frameworks can have as many fancy things as it like.
Conformance test ---such as each individual srfi tests. Those tests are expected to be taken into implementations, so that each implementation can run it as a part of the implementation's test suite. Those test interface are expected to be easy to integrate into the implementation's own choice of tests.

My concern is mainly the latter. Gauche supports srfi-64 and srfi-78 (and (chibi test)) in a way that, when used with (gauche test), each test of those srfis become a thin wrapper to the gauche's own test so that the reporting, success/failure count, etc works seamlessly.

We can have sophisticated test framework for the former, but I'd like it to be noted not to be used for the latter kind of tests.

lassik · 2020-08-04T07:22:40Z

Thanks for the links.

The most important thing would be to separate the test runner from the test definition framework, so that tests defined using any framework can be run using any runner.

Test runners can be quite complex, and people don't agree on which one is best. Definition frameworks are much simpler, and can be easily ported to new Schemes.

The currently dominant frameworks are SRFI 64 (A Scheme API for test suites, originally from Kawa) and Chicken's test egg which was also adapted by Chibi. Chicken test has an almost identical API to SRFI 64. So lots of Scheme tests are using almost the same syntax already.

I'd like to publish a SRFI that contains:

Only the test definition part from SRFI 64, not the test runner.
Adds the extra stuff from Chicken/Chibi test (there is not much extra, only a couple of things).
Adds a few more minor convenience procedures for testing exceptions.

This definition framework could be supported by any existing test runner.

lassik · 2020-08-04T07:27:13Z

For SRFI testing, it would be nice to get the files in this srfi-test repo into a form that many Scheme implementations can import directly (i.e. just copy the .scm file into their repo with no changes). That would make it easy to keep everyone up to date with the latest tests.

I think being able to copy the files with no changes important. Then it's so easy that people will actually do it. If small changes are needed here and there, it will become a burden.

johnwcowan · 2020-08-04T17:18:54Z

I want to mention something off-topic but which I think will be of interest to readers of this thread. Every Lisper, and probably every programmer who uses a REPL, does informal testing using the REPL. However, such testing does not generate a reusable artifact: you can't use it to find regressions, for example, except by accident.

The idea here is to enhance Scheme REPLs that support REPL commands to help generate such artifacts. A REPL command is something you can input that by convention is not just another Scheme expression to evaluate. For example, in Chicken the REPL commands take the form ,foo, since (unquote foo) is not meaningful Scheme. In Chibi they look like @FOO, which could be a variable to evaluate but generally isn't. Other conventions are used elsewhere; I'll use Chicken's here. A REPL command can take arguments, usually by callingread to collect them.

The idea is that as you noodle around in the REPL, a test script is being created. You start making such a script with a ,testscript command that specifies the pathname of the test script. Then you go along doing your informal tests like this:

> (+ 1 2)
3
> ,ok

This causes a test case to be written to the file which tests that (+ 1 2) => 3. In SRFI-64 that would be (test-equal 3 (+ 1 2)). The ,ok command needs access to the REPL history and to the last value returned. If multiple values are returned, the test case becomes more complicated; if the returned value is #t or #f, we can get clever and use a different test operation.

To this we can add ,error, which tests that the last evaluation throws an exception, and ,endtest, which closes the test script. All other REPL operations including ordinary evaluation have no effect on the test script.

arthurgleckler · 2020-08-04T18:44:14Z

On Tue, Aug 4, 2020 at 12:05 AM amirouche ***@***.***> wrote: I read here and there many people have ideas about a new test SRFI. What are your ideas? ref: https://srfi.schemers.org/srfi-64/ ref: https://srfi.schemers.org/srfi-78/

* I use my own test framework, attached, which was inspired by the original version of JUnit. Its important features are: * Every test is lexically enclosed. * Every test or group of tests has a name. * Tests are first-class objects. * One can run tests in a mode where only failures are reported. This way, one doesn't have to wade through output in order to figure out whether everything passed, or what failed. * It's possible to run individual tests or test groups or all defined tests. * Test groups can be defined concisely. * Tests only pass if they return the symbol passed. That makes it harder for buggy tests to appear to pass when they actually never ran. * The assert macro uses simple heuristics to display the values that were passed to it. This makes it less necessary to have a family of assert macros for different purposes. * There is an assert-signals-condition macro to test that an expression causes a particular condition to be raised. * Failure reports show the captured continuation of the failing test. This continuation can be used with MIT Scheme's debug to walk the stack of the failure, examining variables, etc. This is particularly useful when an unexpected condition is raised during the test. Here\'s a transcript of using the `assert` macro. Note how the arguments to assert are displayed in the failure message. That way, it\'s easy to read most test failure reports to see exactly what went wrong. 1 ]=> (let ((x '(a b c))) (assert (equal? x '(a b c)))) ;Value: passed 1 ]=> (let ((x '(a b c))) (assert (equal? x '(a b c d)))) ;Assertion failed: (equal? x (quote ...)) (equal? (a b c) (a b c d)) ;To continue, call RESTART with an option number: ; (RESTART 1) => Return to read-eval-print level 1. 2 error> C-c C-c Interrupt option (? for help): C-c C-c ;Quit! 1 ]=> Here\'s an example of using the `assert-signals-condition` macro: (define (assert-singleton list) (assert (and (pair? list) (null? (cdr list))) "List must contain exactly one element.")) (define-test (assert-singleton) (assert-singleton '(x)) (let ((c condition-type:simple-error)) (assert-signals-condition c (assert-singleton '())) (assert-signals-condition c (assert-singleton '(x y))) (assert-signals-condition c (assert-singleton 5)))) Here\'s a procedure definition and a single named test: (define (singleton-list? value) "Return true iff value is a list of length one." (and (pair? value) (null? (cdr value)))) (define-test (singleton-list?) (assert (not (singleton-list? '()))) (assert (singleton-list? '(1))) (assert (not (singleton-list? '(1 2))))) Here\'s a procedure definition and a group of tests defined together. Note how the `define-test-group` macro takes a name, a procedure to be run on each set of data to be tested, and a list of rows of data to pass to the procedure. (define (length= lst size) "Return true iff `lst' has length `size', otherwise #f." (let next ((count size) (elements lst)) (cond ((null? elements) (zero? count)) ((zero? count) #f) (else (next (-1+ count) (cdr elements)))))) (define-test-group (length=) (lambda (expected lst size) (assert (eq? expected (length= lst size)))) '(#t () 0) '(#t (a b c) 3) '(#f (a b c) 4) '(#t (a b c) x)) ; This will fail. Below is a transcript of running these tests. Note how `run-single-test` shows the results, pass or fail, of the test whose name was passed to it, whereas `show-failing-tests` runs all defined tests and only shows the results of the failing ones. So `show-failing-tests` is most useful for finding out whether some test is failing and which test it is, whereas `run-single-test` is most useful for repeatedly running a specific test while debugging it or while refactoring the code that it tests. 1 ]=> (run-single-test '(length=)) (length= #t () 0) PASSED. (length= #t (a b c) 3) PASSED. (length= #f (a b c) 4) PASSED. #[unit-test 5586 (length= #t (a b c) x)] FAILED #[condition 5587 "wrong-type-argument"] (type #[condition-type 5588 "wrong-type-argument"]) (continuation #[continuation 5589]) (restarts (#[restart 5590 abort])) (field-values #(x #f integer-zero? 0)) (properties #[|1d-table| 5591]) 3 of 4 tests passed. ;Unspecified return value 1 ]=> (show-failing-tests) #[unit-test 5586 (length= #t (a b c) x)] FAILED #[condition 5592 "wrong-type-argument"] (type #[condition-type 5588 "wrong-type-argument"]) (continuation #[continuation 5593]) (restarts (#[restart 5594 abort])) (field-values #(x #f integer-zero? 0)) (properties #[|1d-table| 5595]) ;Unspecified return value 1 ]=> (debug #@5593) There are 18 subproblems on the stack. Subproblem level: 0 (this is the lowest subproblem level) Expression (from stack): (integer-zero? 'x) There is no current environment. There is no execution history for this subproblem. You are now in the debugger. Type q to quit, ? for commands. 2 debug> u Subproblem level: 1 Compiled code expression unknown #[compiled-return-address 5596 ("list" #x3a) #x14f #x1f34e94] Environment created by the procedure: NEXT applied to: (x (a b c)) There is no execution history for this subproblem. 2 debug> q ;Unspecified return value 1 ]=> [unit-test.zip](https://github.com/srfi-explorations/srfi-test/files/5105331/unit-test.zip)

johnwcowan · 2020-08-04T20:14:37Z

I want to mention something off-topic but which I think will be of interest to readers of this thread. Every Lisper, and probably every programmer who uses a REPL, does informal testing using the REPL. However, such testing does not generate a reusable artifact: you can't use it to find regressions, for example, except by accident.

The idea here is to enhance Scheme REPLs that support REPL commands to help generate such artifacts. A REPL command is something you can input that by convention is not just another Scheme expression to evaluate. For example, in Chicken the REPL commands take the form ,foo, since (unquote foo) is not meaningful Scheme. In Chibi they look like @FOO, which could be a variable to evaluate but generally isn't. Other conventions are used elsewhere; I'll use Chicken's here. A REPL command can take arguments, usually by callingread to collect them.

The idea is that as you noodle around in the REPL, a test script is being created. You start making such a script with a ,testscript command that specifies the pathname of the test script. Then you go along doing your informal tests like this:

> (+ 1 2)
3
> ,ok

This causes a test case to be written to the file which tests that (+ 1 2) => 3. In SRFI-64 that would be (test-equal 3 (+ 1 2)). The ,ok command needs access to the REPL history and to the last value returned. If multiple values are returned, the test case becomes more complicated; if the returned value is #t or #f, we can get clever and use a different test operation.

To this we can add ,error, which tests that the last evaluation throws an exception, and endtest, which closes the test script. All other REPL operations including ordinary evaluation have no effect on the test script.

johnwcowan · 2020-08-05T03:13:23Z

Test runners can be quite complex, and people don't agree on which one is best. Definition frameworks are much simpler, and can be easily ported to new Schemes.

+1

Only the test definition part from SRFI 64, not the test runner.

Adds the extra stuff from Chicken/Chibi test (there is not much extra, only a couple of things).

Adds a few more minor convenience procedures for testing exceptions.

IMO some further re-engineering is needed. Most of the test-* functions of SRFI 64 exist in order to implicitly supply the equivalence predicate. But SRFI 64 predated parameters and comparators.

I think the Chicken and Chibi test system's use of a parameter current-test-comparator is the Right Thing. The equality predicate of this comparator is used to decide if the expected and actual values are the same. The default comparator's equality predicate is exposed as test-equal?, whose behavior is the same as equal? unless both arguments are inexact numbers, in which case it uses the value of the current-test-epsilon parameter to do a relative error test.

That said, all we really need is test, but test-assert and test-not (in Chibi but not Chicken, for some reason) are convenient. Beyond that, test-error (with an error predicate) is necessary, and you can make a case for test-syntax-error (checks a string to see if it can be evaluated).
Finally, a inimal test-grouping facility: test-group with a mandatory name.

And that's all.

amirouche · 2020-08-05T06:45:33Z

I like @arthurgleckler test framework 😃

The problem with SRFI-64 is that it does not allow to run a single test or group of tests separately. That may not be necessary for the small-ish test suite of SRFI, but in a real world scenario where tests take more than a few seconds or minutes to run it becomes necessary to run one test at a time. Another thing that painful about SRFI-64, since the tests are not first-class, it is not possible to run the test in a REPL.

I think we should have something along the lines of @arthurgleckler tests framework.

amirouche · 2020-08-05T06:48:35Z

It seems to me having two or more ways to define tests is not a good thing.

sjamaan · 2020-08-05T06:54:38Z

The CHICKEN (and probably Chibi) test library has a way to filter tests using an environment variable. Howver, as far as I know this just filters the output and still runs all the other tests. I don't know if there's an easy way to support this in a way that it will actually run only the selected tests?

sjamaan · 2020-08-05T06:55:05Z

Making tests first-class would probably solve the filtering problem too

lassik · 2020-08-05T10:23:22Z

Making tests first-class would probably solve the filtering problem too

OK, let's do it. A SRFI for reflection on the available test suites and test cases, and for making new ones. Kind of like a WSGI-style meeting point that sits between test definition frameworks and test runners. All frameworks and runners could be implemented against this interface.

lassik · 2020-08-05T10:26:11Z

We should keep all the strictly runner-related concepts out of it though :) That's the stuff where complexity comes form.

ashinn · 2020-08-27T04:23:19Z

The CHICKEN (and probably Chibi) test library has a way to filter tests using an environment variable. Howver, as far as I know this just filters the output and still runs all the other tests. I don't know if there's an easy way to support this in a way that it will actually run only the selected tests?

This doesn't filter output - it only runs the selected tests. It can also be controlled from Scheme via parameters, the init values from env vars are for convenience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the ideas for a new standard test SRFI? #3

What are the ideas for a new standard test SRFI? #3

amirouche commented Aug 4, 2020

amirouche commented Aug 4, 2020

shirok commented Aug 4, 2020

lassik commented Aug 4, 2020

lassik commented Aug 4, 2020

johnwcowan commented Aug 4, 2020

arthurgleckler commented Aug 4, 2020 via email •

edited

Loading

johnwcowan commented Aug 4, 2020

johnwcowan commented Aug 5, 2020

amirouche commented Aug 5, 2020

amirouche commented Aug 5, 2020

sjamaan commented Aug 5, 2020

sjamaan commented Aug 5, 2020

lassik commented Aug 5, 2020

lassik commented Aug 5, 2020

ashinn commented Aug 27, 2020

What are the ideas for a new standard test SRFI? #3

What are the ideas for a new standard test SRFI? #3

Comments

amirouche commented Aug 4, 2020

amirouche commented Aug 4, 2020

shirok commented Aug 4, 2020

lassik commented Aug 4, 2020

lassik commented Aug 4, 2020

johnwcowan commented Aug 4, 2020

arthurgleckler commented Aug 4, 2020 via email • edited Loading

johnwcowan commented Aug 4, 2020

johnwcowan commented Aug 5, 2020

amirouche commented Aug 5, 2020

amirouche commented Aug 5, 2020

sjamaan commented Aug 5, 2020

sjamaan commented Aug 5, 2020

lassik commented Aug 5, 2020

lassik commented Aug 5, 2020

ashinn commented Aug 27, 2020

arthurgleckler commented Aug 4, 2020 via email •

edited

Loading