-
Notifications
You must be signed in to change notification settings - Fork 289
Hedy Language Design
Learning to program is hard for newbies. The goal of the Hedy language is to start with a language that is as simple as possible, and to add more and more syntax as the levels progress. For this we have a number of clear design goals.
-
Concepts are offered at least three times in different forms:
Research from writing education has shown that it is best to offer concept in different forms over a long period of time. Furthermore it has been shown that a word needs to be read 7 times before it is stored in long-term memory .
-
The initial offering of a concept is the simplest form possible:
Previous research has shown that syntax can be confusing for novices. Early levels thus are as syntax-free as possible to lower cognitive load.
-
Only one aspect of a concept changes at a time:
In his paper on the spiral approach Shneiderman argued for small steps in teaching programming, which we follow for Hedy too. This allows us to focus the full attention of the learner on the new syntactic element.
-
Adding syntactic elements like brackets and colons is deferred to the latest moment possible:
Previous research in the computer science education domain has shown that operators such as == and : can be especially hard for novices. In a study with high-schoolers we found that that might be due to their pronunciation. Research from natural language acquisition also indicates that parentheses and the colon are among the latest element of punctuation that learners typically learn. Given the choice between colons and parenthesis and other elements like indentation, the latter are introduced first.
-
Learning new forms is interleaved between concepts as much as possible:
We know that spaced repetition is a good way of memorizing, and that it takes time to learn punctuation, so we give students as much opportunity as possible to work with concepts before syntax changes.
-
At every level it is possible to create simple but meaningful programs:
It is important for all learners to engage in meaningful activities. Our experience in teaching high-school students (and even university CS students) is that learning syntax is not always seen as a useful activity. Students experience a large discrepancy between the computer being smart, for example by being able to multiply 1,910 and 5,671 within seconds, while simultaneously not being able to add a missing colon independently. We anticipate that when the initial syntax is simple, allowing novices to create a fun and meaningful program, they will later have more motivation to learn the details of the syntax.
In its current form, Hedy consists of 18 different levels. The levels loosely follow the lesson series Python in de klas ("Python in the classroom") in such a way that these existing lessons can be executed with Hedy instead of with Python.
At the first level, students can firstly print text. For this, no
syntactic elements are needed other than the keyword print
followed by arbitrary text. Furthermore students can ask for
input of the user using the keyword ask
. Here we decided to use
the keyword ask
rather than input
because it is more aligned
with what the role of the keyword is in the code than with what it
does. Input of a user can be repeated with echo
, so very simple
programs can be created in which a user is asked for a name or a
favorite animal, fulfilling Design Goal 6.
At the second level, variables are added to the syntax. Defining a variable is done with the word is
rather than the equals symbol fulfilling Design Goal 3 and Design Goal 4.
In level 3 we add the option to create lists and retrieve elements, including random elements from lists with at
. Adding lists and especially adding the option to select a
random item from a list allows for the creation of more interesting programs such as a guessing game or a story with random elements, which is an assignment from Python in de klas ("Python in the classroom"), or a customized dice.
Level 3 also allows learners to add and remove lists elements with a textual syntax: add animal to animals
.
In Level 3 the first syntactic element is introduced: the use of quotation marks to distinguish between strings and text. In teaching novices we have seen that this distinction can be confusing for a long time, so offering it early might help to draw attention to the fact that
computers need information about the types of variables. This level is thus an interesting combination of explaining syntax and explaining
programming concepts, which underlines their interdependency. The variable syntax using is
remains unchanged, meaning that learners
can now use both number is 12
and name is Hedy
.
In Level 5, selection with the if statement is introduced, but the
syntax is 'flat', i.e. placed on one line, resembling a regular syntax
more:
if name is print
Another feature is the if pressed
statement, making it possible to link commands
to any a-z and 1-0 key on the keyboard:
if x is pressed print
Else statements are also included, and are also placed on one line,
using the keyword else
:
if name is print else print
.
this also works together with pressed
.
In Level 6, students learn to calculate with variables. Therefore addition,
multiplication, subtraction and division are introduced. While this
might seem like a simple step, our experience taught us that the use of
*
for multiplication, rather than x
is a
steep learning curve and should be treated as a separate learning goal.
In working with non-English native Python novices. Research has found
the keyword for
to be a confusing word for repetition, especially
because it sounds like the word 'four'. For our
first simplest form, according to Design Goal 2, we opt to use
Quorum syntax repeat x times
. In this
initial form, like the if the syntax is placed on one line:
repeat 5 times print
Repeat can also be used in combination with pressed
, so that the program will
a keypress multiple times before terminating.
After Level 7, there is a clear need to 'move on', since the body of a loop
(and also that of an if
) can only consist of one line, which limits the
possibilities of programs that users can create. We assume this
limitation will be a motivating factor for learners, rather than 'having
to learn' the block structure of Python, they are motivated by the
prospect of building larger and more interesting programs (Design Goal
6). The syntax of the loop remains otherwise unchanged as per Design
Goal 3, so the new form is:\
repeat 5 times
print 'Hello'
print 'I am repeated 5 times'
To allow for enough interleaving of concepts (Design Goal 5), we defer the introduction of syntax concepts for now, and focus on more conceptual additions: the nesting of blocks. We know indentation is a hard concept for students to learn, so this warrants its own level (Design Goal 3).
In level 10, learners the for
syntax to loop over the values in a list with for animal in animals
.
This allows the customization of stories, drawing and songs.
Once blocks are sufficiently automatized, learners will see a more
Python-like form of the for loop, namely: for i in range 0 to 5
.
This allows for access to the loop variable i
and this allows for
more interesting programs, such as counting to 10. As per Design Goal 3,
the change is made small, and to do so (following Design Goal 4),
brackets and colons are deferred to a later level, but indentation which
was introduced in Level 8 remains.
Learners are now allowed to use floats and need to place quotation marks around strings to distinguish them from numbers.
In level 13, Learners learn about and
and or
in if
statements.
In level 14, Learners learn about <(=)
and >(=)
in preparation for while loops.
In level 15, learners are introduced to the while loop. With the previous knowledge of loops and <=
and >=
, learners can make basic while loops.
In this level, learners encounter brackets for the first time, because it adds rectangular brackets for list access, which up to now was done with the keyword at
, following Design Goal 2. THis level also explain accessing lists with a numeric index, starting at 1. The code to access a specific
value has already been available technically since level 2, but there was no explanation yet how to access a specific value and it is not used in examples (and should maybe be removed?)
To make the step to full Python, learners will need to use the colon to denote the beginning of a block, in both loops and conditionals. Because blocks are already known and practiced over several levels, we can teach learners to use a colon before every indentation.
This level also introduces elif
to allow for more exciting programs, since just adding a colon does not really create engagement.
Level 18 adds round brackets in print
and range
and changes ask
to input
. As per Design Goal 4, these are added as late as possible.
All levels allow for the use of comments, and it is up the the teachers to explain their different uses.
Every level of Hedy is essentially a new language which requires its own grammar. Due to the gradual nature of Hedy, however, the grammar of each level is only slightly different from the grammar of the previous one. To avoid massive duplication, grammar code in Hedy is organized in the following manner:
- A
level1.lark
file serving as a base grammar file. - A
level[1-9+]-Additions.lark
file for every level. Each file describes only the grammar changes compared to the previous level. Addition files can add new grammar rules or override existing ones.
To get the grammar of a concrete level, Hedy takes the grammar of level 1 and merges consecutively all the changes
specified in the Addition files until the required level is reached. The final merged grammars for all levels
are generated in the /grammars-Total
folder.
Hedy has a rudimentary type system created to provide better error messages to end users. The type system performs type inferring, type validation and lookup table enrichment before transpilation happens. Note that if in the future the transpiler still does not require any of the lookup table enrichments done by the type system, type validation and transpiling can run in parallel.
The type system requires as input a lookup table containing the names of all variable definitions, which it later
enriches with their inferred types. The supported types are string
, integer
, float
, list
, boolean
, input
,
any
and none
. The type any
is used when types cannot be inferred and is ignored in all type validations. The type
input
is a composite data type used to denote user input (retrieved through the ask
and input
commands), which
means input
can be multiple types depending on the value the user enters. At the moment, the user input could be
interpreted as string
, integer
or float
. The lookup table is also used by the transpiler to differentiate literals
from expressions, e.g. the literal 'text' vs a variable called 'text'. Because of that the lookup table does not contain
only variable definitions, but also all expressions that need to be escaped, e.g. variable access such as animals[0]
.
The lookup table is created and enriched in two separate steps. The first traversal of the abstract syntax tree puts in
the lookup table the entries required by the transpiler along with a reference to the sub-tree needed to infer their
type. For example, the line a is 1
will add the following entry {name: 'a', tree: {data='integer', children:['1']}}
The second traversal of the abstract syntax tree is performed to infer the types of expressions, store the inferred
types of variables in the lookup table, and perform type validation. If during the second step the type system
encounters a variable with type that has not been inferred yet, it will use the tree stored in the lookup entry to infer
its type. Note that there are valid scenarios in which the lookup entries will be accessed before their type is inferred.
This is the case with for loops:
for i in 1 to 10
print i
In the above case, print i
is visited before the definition of i in the for loop. To mitigate the issue, the lookup
entry tree is used to infer the type of i
. There is a guard against cyclic definitions, e.g. b is b + 1
.
We try to craft our error messages along these lines.
- (i) Structure. We aim to follow a consistent structure for all of our error messages. First sentence will inform our users of the issue, while the second sentence will offer a suggestion on how to resolve it. This approach helps users quickly understand where to look when something goes wrong and allows them to become accustomed to reading the error messages more efficiently due to their uniform structure.
- (ii) Consistency. We aim to keep our error messages consistent by using the uniform wording and maintaining a similar sentence structure and length. Errors should follow the same format ensuring that the sentences start and end similarly. This consistency also aids in the readability and the structure of the error messages.
- (iii) Keeping Them Short. We aim to keep our error messages concise and short while providing sufficient information about the issue and its cause. Given that people often do not read the entire error message, critical information will be presented at the beginning.
- (iv) Specific Errors. We aim to create error messages that are specific enough to help users understand programming concepts. This is particularly important for new programmers, who require more detailed guidance compared to those with prior knowledge. To assist new users effectively, we strive to provide distinct error messages for similar issues, as a generic message such as ‘syntax error’ is not sufficiently informative.
- (v) Language.
- The first thing regarding the language use is to maintain a positive and encouraging tone such as avoiding words like ‘illegal’ and ‘invalid’, we basically want to blame the computer, not the user.
- The second consideration for our language use is to maintain simplicity, minimizing the use of programming terms. Terms should be from the programmer's perspective, not the compilers, so we avoid words like 'initialization' and use simpler terms such as 'set' or 'declare'.
- The third consideration for our language use is to employ anthropomorphic messages for specific types of errors, implying that computers can think. For example, we might say, 'We detected that... .' We use this type of language in the first sentence of error messages to explain the problem. This approach is beneficial, particularly for younger users, as children tend to use correlation to learn new topics more effectively.
- (vi) Attention. We aim to catch our user’s attention by using highlighting and using different type of fonts for helpful indications such as a missing command, or a typo in one of the commands. Additionally, we specify the line containing the error, as the system highlights the line where the error occurred, making it easily visible.
- (vii) Documentation and Examples. This process focuses on improving the error messages by incorporating feedback and suggestions from users and documenting them (These guidelines in itself is a documentation). Additionally, examples are provided at the top of the console named as 'Adventures' to guide students to prevent them from making errors.