Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add union types to Typed GDScript #737

Open
chanon opened this issue Aug 8, 2018 · 33 comments
Open

Add union types to Typed GDScript #737

chanon opened this issue Aug 8, 2018 · 33 comments

Comments

@chanon
Copy link

chanon commented Aug 8, 2018

Bugsquad edit: This proposal is a superset of #162.

EDIT: Tried to format it according to the 'template'.

Describe the project you are working on:
I was working on a game using Godot. (Have since moved to another engine due to many issues with Godot.)

Describe the problem or limitation you are having in your project:
(Original text)
The main reason I want it is due to Array and Dictionary type not being able to receive null values and I want to be able to use null values as the "default" value for functions.

This is a very common pattern in my pre-existing code and without this ability I'm not sure what to use as "default" values. Empty arrays and dictionaries? That could cause needless memory allocations I think. Also, there could be a difference in meaning between passing an empty array and passing null to a function, for example. Also it is simpler to just use null and compare/check for null.

Dictionary is used as an 'anonymous object' very often, so it makes a lot of sense to allow null. Since the type does not allow it, union types could help.

This is the biggest issue that makes me not able to use Typed GDScript more in my code.

Describe the feature / enhancement and how it helps to overcome the problem or limitation:
Union types would be written as Type1 | Type2 in place of the type.

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:
Examples:

# heal the target (which has its data in a Dictionary) if specified, otherwise if no target specified, heal self
func heal(target: Dictionary| null = null):
	# heal
	...
# a function that receives a parameter that is either an int or a String
func do_something(parameter:  int | String):
	# do something with the parameter
	...

# apply damage to a Player character or an Enemy character
func apply_damage(target:  Player | Enemy):
	# apply the damage
	...

If this enhancement will not be used often, can it be worked around with a few lines of script?:
The workaround is to not specifiy types and thus lose type safety.

Is there a reason why this should be core and not an add-on in the asset library?:
It is part of GDScript which is core.

@Zylann
Copy link

Zylann commented Aug 8, 2018

Isn't Dictionary nullable though? Or is this too confuses typed GDScript?

@chanon
Copy link
Author

chanon commented Aug 8, 2018

Isn't Dictionary nullable though? Or is this too confuses typed GDScript?

Yeah, I thought it would be nullable, but no:
image

My initial comment and discussion about this:
godotengine/godot#19264 (comment)
godotengine/godot#19264 (comment)

From:
http://docs.godotengine.org/en/3.0/getting_started/scripting/gdscript/gdscript_basics.html?highlight=constructor#built-in-types

The only type that can be null currently is Object.

Looking at that list, for example, you can't define a function that receives a Color and allow sending null instead of a Color to use the default color. Or Vector3 or Rect2 etc.

My other feature requests (just dropping the link here so I don't have to search for it in that crazy long thread again .. will maybe create issues for more of them later)
godotengine/godot#19264 (comment)

@erodozer
Copy link

erodozer commented Aug 8, 2018

I'd say the handling of nullable should be a separate issue and not handled by union types. For Type safe nullables in other languages like TypeScript and Kotlin, you usually just suffix the type with a ?, which is a nice syntactical trick. If more types should allow nullable other than just Object, then something like that should be available to typed GDScript.

As for Union types, I'm in full support of this, as it's something that languages which do not allow for method overloading pretty much need to have if they're going to allow for types. Python does it this way too, if we want to continue with the ideology of GDScript should be pythonic.

@vnen
Copy link
Member

vnen commented Aug 8, 2018

I'm not against this, but don't expect it for 3.1. I deliberately made the typing as simple as possible because I knew it would break a lot of stuff (as it did) and complicating it would break even more. We also have to evaluate this carefully, because union types are more prone to runtime errors (though less than a true Variant) and it will lose the benefit of the typed optimizations.

Array and Dictionary type not being able to receive null values

Only Objects can be nulled, Array and Dictionaries are a "primitive" type (that is, they are defined inside Variant), so they are not considered objects.

Empty arrays and dictionaries? That could cause needless memory allocations I think.

null is also a Variant, I don't think empty Arrays and Dictionaries allocate more memory than that.

Also, there could be a difference in meaning between passing an empty array and passing null to a function

And that's why null is not allowed. We certainly could add nullable or sum types, but I wouldn't expect an int to be null, and the same logic applies to Arrays and Dictionaries since they're all built-in types.

if we want to continue with the ideology of GDScript should be pythonic.

We don't have this idea though. We try to mimic Python for syntax, just because users expect it (since GDScript is already somewhat similar to Python), and it's easier than the bikeshed discussion most of the time. In fact, I'm not sure what's the plan for GDScript, since godotengine/godot#18698 didn't reach a conclusion yet.

@raymoo
Copy link

raymoo commented Aug 8, 2018

We could have tagged unions (ala Rust, Scala, Haskell) instead of unions if type safety is an issue. It wouldn't solve the problem of compatibility of old code, though.

@chanon
Copy link
Author

chanon commented Aug 9, 2018

For Type safe nullables in other languages like TypeScript and Kotlin, you usually just suffix the type with a ?, which is a nice syntactical trick. If more types should allow nullable other than just Object, then something like that should be available to typed GDScript.

Yes, that would also be nice.
How it would look in the example above:

# heal the target (which has its data in a Dictionary) if specified, otherwise if no target specified, heal self
func heal(target: Dictionary? = null):
	# heal
	...

@Hamcha
Copy link

Hamcha commented Feb 28, 2019

Just wanted to say that I also would love union types but not for nullable, but for errors, for example I'd love to be able to have my parser do something like:

func compileSource(source: String): CompiledBlah | SourceError

and if we're going the typescript/haskell route, then being able to define types as union of other types (type CompileResult = CompiledBlah | SourceError) would be even better.

For now, I'll just stick to not having a type (which is kinda sad, would at least like to have a any type, at least until better options are available)

@willnationsdev
Copy link
Contributor

Having full-on unionized types seems like overkill to deal with the actual problem of needing to handle nullable / optional parameters in a method. As mentioned, only Objects, not Arrays or Dictionaries, can be null for a reason. Given this, I think the only appropriate solution to this problem would be to add support for method overloading to GDScript. This way, you can re-define a method multiple times with multiple sets of typed parameters and have confidence that the logic you write will apply only to those particular objects.

I suppose, to play Devil's Advocate, that might lead to people creating empty implementations of a method that just accepts null, purely so they can create optional operations...

func use_array(p_array: Array):
    print(p_array)
func use_array(p_array: null):
    pass # do nothing. Why can't we just make p_array nullable and check inside the method?

...but that also seems like more of a problem regarding one's use of the method itself. Why call use_array at all if you are passing a null value into it, i.e. the problem solves itself by modifying one's API to execute the if-check at the time of calling the function rather than inside of it. Now, whether that's a positive change or not (since it takes the parameter-handling logic out of the method, so-to-speak) is up for debate I suppose.

@aaronfranke
Copy link
Member

@chanon Can you adjust the OP to fit the proposal template?

@chanon
Copy link
Author

chanon commented Apr 20, 2020

EDIT: ok I will try to adjust it.

@aaronfranke aaronfranke transferred this issue from godotengine/godot Apr 20, 2020
@me2beats
Copy link

me2beats commented Jul 12, 2020

I create a method that gets Array as an argument, but say I'd like it to work with PoolStringArray as well.
Now I can't specify arg type at all.

Imo In such cases some kind of union types really could be helpful and flexible

func(arg: Array | PoolStringArray):

@me2beats
Copy link

me2beats commented Feb 8, 2021

I agree with @willnationsdev #737 (comment) that method overloading (#1571) can be useful in solving problems related to the lack of nullable types, although I anticipate cases where nullable types can still be more convenient (when we have several parameters).

so I could rewrite my previous comment this way (when method overloading would be implemented):

func(arr: Array):
	...

func(arr: PoolStringArray)
	...

although in this case it might cause code duplication if I want to call the same methods for Array and PoolStringArray, for example append() that is:

func(arr: Array):
	...
	arr.append(1)

func(arr: PoolStringArray)
	...
	arr.append(1)

instead of just

func(arg: Array | PoolStringArray):
	...
	arr.append(1)

however nullable / union types also solve some other problems for example now I can't return null if I specify int as return type:

func foo()->int:
	return null # if something went wrong

I believe nullable like ->int? or union ->int|null would solve the problem.

@Bromeon
Copy link

Bromeon commented Feb 27, 2021

Even if common in JavaScript/TypeScript, I'm not sure if untagged unions are good practice to justify being language feature. Parameters like Player | Enemy (or worse, int | String) often indicate a lack of abstraction and require explicit type differentiation inside the function. This often defeats a main advantage of static typing in the first place. The problem with untagged union types is that they are not typesafe, and static analysis can no longer detect the possible methods for autocompletion.

Instead, I agree with @raymoo that if unions are supported, we should aim for something like Rust enums, maybe simplified (i.e. a variant type). Rust provides the best and most concise implementation of tagged unions I've seen so far. The pattern matching makes it easy to use, but is not strictly necessary -- however there should be a way to enforce checking the type tag before accessing the variable.

The main difference would be that the union type gets a name, with named fields. For example:

variant Target:
   enemy: Enemy
   player: Player

func apply_damage(target: Target)
   pass

The advantage is that this works even when the same type is used multiple times, while the a | b notation already breaks down:

variant Cash:
   euro: int
   dollar: int

Also, I think nullable types are completely independent of unions and occur much more frequently. They also deserve a special syntax T?, because T | null is just verbose and doesn't add anything. A null type on its own can only ever have one value, so carries no information -- there's thus no need to make it a static type and allow parameter: null. There's already void to signal "no return value".

Kotlin is a very good inspiration for nullable type design. Java, C#, and even more so JS and PHP are great examples of how not to do it.

@mnn
Copy link

mnn commented Feb 28, 2021

The problem with untagged union types is that they are not typesafe, and static analysis can no longer detect the possible methods for autocompletion.

Neither of these are true. Just look at TypeScript. If you try to call a method which is only available on one member of a union, compilation fails. And to the second point - IntelliJ IDEA and VSCode suggests only available methods/fields, so before you do narrowing, you get methods/fields which are same (common) for all members of the union, after narrowing you are getting suggestions only for the narrowed type.

class Enemy {
    die() {}
    applyDamage() {}
}

class Player {
    killTarget(x: Enemy) { x.die(); }
    applyDamage() {}
}

type Entity = Enemy | Player;

const testEnemy = new Enemy();

const applyDamage = (target: Entity) => {
    target.applyDamage(); // correctly no error, only suggested method is applyDamage
    target.die() // correctly error: Property 'die' does not exist on type 'Player'.
    if (target instanceof Player) {
        // only methods for Player are suggested
        target.killTarget(testEnemy); // correctly no error
    } else if (target instanceof Enemy) {
        // only methods for Enemy are suggested
        target.die() // correctly no error
    }
}


type PrimitiveUnion = string | number;

const g = (x: PrimitiveUnion): string => {
    // primitive types are type-safe as well
    switch (typeof x) {
        case 'string':
            x.toFixed(2); // correctly error (number method): Property 'toFixed' does not exist on type 'string'.
            return x.toUpperCase();
        case 'number':
            x.toUpperCase(); // correctly error (string method): Property 'toUpperCase' does not exist on type 'number'.
            return x.toFixed(2);
    }
}

Code is a bit cumbersome without pattern matching, but it is type safe and autocompletion works well.

I too like tagged unions better, here's the cash example type in Haskell:

data Cash = Euro Int | Dollar Int

I personally don't think it's a great type - IMO having a currency type and amount separate works better, but it's just an example.

@Bromeon
Copy link

Bromeon commented Feb 28, 2021

If you try to call a method which is only available on one member of a union, compilation fails. And to the second point - IntelliJ IDEA and VSCode suggests only available methods/fields, so before you do narrowing, you get methods/fields which are same (common) for all members of the union, after narrowing you are getting suggestions only for the narrowed type.

Ah, interesting, thanks for the clarification! Out of curiosity, do you know if the inspection is also "deep" in the sense that not only method names, but also parameter types (when typed) of the methods are checked?

class Enemy {
    applyDamage(damage: float) {}
}

class Player {
    applyDamage(damage: int) {}
}

const applyDamageUntyped = (target: Enemy | Player, damage /* untyped/any */) => {
    target.applyDamage(damage); // should compile...?
}

const applyDamage = (target: Enemy | Player, damage: int | float) => {
    target.applyDamage(damage); // should not compile...?
}

Yep, currency is not the best example, especially for games 🙂
Variants get mostly interesting if you have multiple options, some of which share the same type, but not all.

If we go the Rust route, variant types are an extension of C enums -- so they can have a simple type tag without extra data, or they can associate fields with that enum. Sticking to existing GDScript syntax, such "without data" variants could have the type void, but it's very well possible that we come up with a better syntax for this.

variant TileType:
   empty: void
   full: void
   diamonds: int  # stores the amount of diamonds in the tile
   switch: bool # stores whether the switch is pressed or not

or:

class DiamondTile:
   amount: int

class SwitchTile:
   enabled: int

variant TileType:
   empty: void
   full: void
   diamonds: DiamondTile
   switch: SwitchTile

In Rust, it would look like this, which is clearer and more expressive, since the fields have names but don't require an external type:

enum TileType {
   Empty,
   Full,
   Diamonds { amount: int },
   Switch { enabled: bool },
}

@nukeop

This comment was marked as abuse.

@kconner
Copy link

kconner commented Sep 5, 2021

I'll offer a word of support for sum types, algebraic data types, tagged unions, sealed classes, or anything you want to call them. By modeling data with sum types in combination with product types, you can shrink-wrap a type to outline only valid data possibilities and offload significant programmer responsibilities to the compiler.

A product type is perfect for expressing Vector3, because all possible combinations of field values are valid. In other words, the fields are totally orthogonal. But for types where some combinations of field values are possible yet invalid, the compiler only does half the job, and the programmer must make sure those combinations remain unused.

For instance, if you've got a GUI screen that shows data you load asynchronously, you might represent that loading data as a product type with nullable fields.

// Product type representing an asynchronously-loaded high score list
struct RemoteLeaderboard {
    var leaderboard: Leaderboard?
    var myRank: Int?
    var error: Error?
}

Here it would not make sense to have both fetched the leaderboard successfully and gotten an error while fetching the leaderboard. At least one of them should be nil. It also wouldn't make sense for the leaderboard to not be loaded yet, but for my own rank to be known already. Yet those cases are allowed by the compiler. A programmer who creates an instance must create only a valid case, or code defensively with runtime checks and handle failures. A programmer who consumes the type must know which cases they can ignore, make assumptions, or again code defensively. Sum types address these problems. With both product and sum types, the compiler lets you make invalid cases impossible by expressing exactly what's valid.

The valid major cases are that you are still loading, or you have the data, or you got an error. And only if we did get the data, maybe I'm ranked and maybe not. We can express that with a sum type of three cases, where each case can include a product type:

// Sum type representing an asynchronously-loaded high score list
enum RemoteLeaderboard {
    case loading
    case done(leaderboard: Leaderboard, myRank: Int?)
    case failed(error: Error)
}

These examples have been in Swift, a close relative of Rust in this regard. If you've done this kind of data modeling for some time in a language that supports it well, you just don't want to go back. Returning to modeling data with only product types feels like working with blunted tools.

In recent years, mainstream app development has taken certain lessons from functional programming. Game development has left some of those opportunities on the table, and this is one of them. Godot could take advantage and gain some true diehard users.


On the topic of nullable types, since that's how this thread started: If you combine sum types and generics, you can offer Optional<T>, whose cases are .none and .some(T). On that basis anything can be nullable, and nil does not have to be a zero pointer sentinel value. In Swift, let x: Int? = nil is syntactic sugar for let x: Optional<Int> = Optional<Int>.none.

@julian-a-avar-c

This comment was marked as off-topic.

@Calinou

This comment was marked as resolved.

@dcallus
Copy link

dcallus commented Mar 14, 2023

Yup. Got to agree. Union types are the best thing in TypeScript & would love to see them in GDScript.

I think one of the best use cases would be for dropdowns.

'Up' | 'Down' | 'Left' | 'Right' could be an union type exported to a dropdown, and you would get the string back in return depending on selection, rather than a number like in an Enum.

@vnen
Copy link
Member

vnen commented Mar 14, 2023

@dcallus

'Up' | 'Down' | 'Left' | 'Right' could be an union type exported to a dropdown, and you would get the string back in return depending on selection, rather than a number like in an Enum.

Note that you can already do:

@export_enum("Left", "Right", "Up", "Down")
var direction: String

Which exports a dropdown in the inspector and the result is a string.

I know this does not cover what is asked in this proposal, but it is something you can use in the meantime.

@nukeop

This comment was marked as abuse.

@Bromeon
Copy link

Bromeon commented Mar 14, 2023

Please don't bump issues without contributing significant new information. Use the 👍 reaction button on the first post instead. [...]
But who does it help to bump an issue? All it does is notify everyone involved who are watching this

There's always the chance that a contribution in the thread is not interesting for you personally. There's no GitHub setting like "only notify me on comments from org members" or similar, so you'll have to live with it or write a bot. However, your criterion "without contributing significant new information" does not apply here: callus mentioned the dropdown use case. I do agree that pure "+1" style comments are not helpful.

Re-initiating an old discussion is also a good way to express "there is still interest in this feature", rather than "it used to be a big problem in Godot 3.1, but no longer applies". From the number of upvotes alone, it's not visible when they happened. Furthermore, thread activity is also one way to sort issues (through "Recently updated").

@nukeop

This comment was marked as abuse.

@vnen
Copy link
Member

vnen commented Mar 15, 2023

It's hard to assess what the community want because the community is just too big. This repository is one attempt of concentrate the needs of the community. Keep in mind that just because you find something essential, it might not be the case for the vast majority of other users and thus will hardly be a priority.

There are also frequent meetings with the maintainers to review proposals (which was put on hold to focus on the 4.0 release but will resume soon). Adding a 👍 reaction does help because we can sort issues by reaction as well and make those more likely to be discussed sooner. Bumping for the sake of it won't help, it will just annoy the subscribers.

That said, detracting from the proposed subject is just going to get this discussion locked, which we prefer to avoid. There are other places to discuss the workflow of proposals and pull requests. Putting this here won't help as it won't be the place we'll go looking if we want to search for suggestions, and only a limited number of people will see this here anyway.

So let's keep the discussion on topic, please.

@mlp1802
Copy link

mlp1802 commented Sep 23, 2023

Just my two cent, I'm extremely interested in features like this, as a Haskell and Rust developer I find pure object oriented imperative languages extremely cumbersome to work with, when one has to express logic more complicated then "if a>b then"
I believe that a captivating language is what will keep people using the engine for years and years, not supporting the latest and greatest graphics feature..especially in the indie dev world where many more or less make game as a hobby. And a hobby should be enjoyable.
Many starts out as novice programmers, and at that point gdscript is just fine, but as they evolve they will find out that it is also very limited, in terms of expressing logic.
They are going in the right direction though with pattern matching on arrays and .map() .filter() methods, so who knows what the future will bring. Just adding myself to the group of "interested" people. Let's keep this thread alive.

@julian-a-avar-c
Copy link

julian-a-avar-c commented Sep 28, 2023

I'm assuming this includes generic types? Array[String | null]. But that's the case where T1 and T2 (T1 | T2) do NOT share types. It appears that String and null are not of type Variant, which was my initial guess. It seems like that's just an internal representation note.

Here is an example of unions where the type hierarchy is not shared:

class Player: pass
class Enemy: pass

var target_entities: Array[Player | Enemy] = []

What to do...

Example where the type hierarchy IS shared:

class Entity: pass
class Player extends Entity: pass
class Enemy extends Entity: pass

var target_entities: Array[Player | Enemy] = []

Then it would be much simpler, I think, since internally you could say that target_entities is of type Array[Entity] (this is something that can already be done IF the type hierarchy is shared: This is allowed var target_entities: Array[Entity] = [Player.new(), Enemy.new()]), and restrict it's usage to type Player and Enemy through some sort of internal annotation. This same method, could technically also be used for types that do not share the same hierarchy; imo it would be far messier: (1) make all types virtually inherit Variant, then strip it out at runtime, because it's probably an performance overhead only for the purpose of this issue, (2) type algebra.

If you want a solution RIGHT NOW, do that, it's annoying... but it works. Contrived example:

class Fish:
	var name = "Fish"
class Tuna extends Fish:
	func _init():
		name = "Tuna"
class Cod extends Fish:
	func _init():
		name = "Cod"

func _ready():
	var fish_can: Array[Fish] = []
	fish_can.push_back(Tuna.new())
	fish_can.push_back(Cod.new())
	# I could also put it inline, but just wanted to show that everything works as expected
	# This includes using other methods
	
	print(fish_can.map(func(fish): return fish.name))
	# Prints `["Tuna", "Cod"]`.

This could also be done for non generics:

var fish: Fish = Tuna.new()
fish = Cod.new()

Well, that's the easy way. If we (sigh I wish I had the time or the wants) are going to do this properly (according to my own definition), then I suggest we expand it to some sort of rigorous type algebra, off the top of my head that would mean adding a &. Let's say we had a class Swimming: var speed: float and a class Breathing: var oxygen_left: int, then we could have an var player: Player & Swimming & Breathing. Under this model, variable names may clash: if there are name clashes then we do not allow the &, or we could have the last item override the field type:

class Entity: pass
    var name: String
    var id: int
class Player:
    var name: PlayerName # Where `PlayerName` is an inner class.
    var speed: float

var player1: Entity & Player = new() # No type is indicated, so it is inferred instead
# Options:
# 1. Error. Types have a field of the same name with clashing types.
# 2. The last (or first) type definition for a field is used. I believe Scala does this.
#   - Last: { name: PlayerName, id: int, speed: float }
#     - I prefer this.
#   - First: { name: String, id: int, speed: float }
# That's all I can come up with right now.

But that's a bit out of topic.

@olson-sean-k
Copy link

olson-sean-k commented Jan 2, 2024

I'm a novice Godot user, but I've been spending time with GDScript and have definitely encountered problems and data that are best modeled with the "or" relationship of sum types. Glad to find that this is being discussed! 😄

Many of the proposed examples in this issue so far seem to describe anonymous sum types, where the sum type and/or its variants are not explicitly named. Anonymous sum types can be very useful and convenient, but I feel a more explicit design may be a better place to start. Importantly, explicit sum types interact better with pattern matching and probably require less varied syntax. Perhaps enum can be used for this purpose, as there is already precedent for this in other programming languages and enums already describe distinct data types in GDScript depending on the specific syntax (i.e., int vs. Dictionary).

For example, perhaps the state of a mob can be described using something like this explicit syntax:

enum MobState:
    IDLE:
        pass
    ATTACKING:
        var target: Node2D
        var power: int

var state = MobState.ATTACKING:
    target = %Player1
    power = 1

match state:
    MobState.ATTACKING { var target }:
        print("changing mob target")
        target = %Player2
    _:
        print("mob isn't attacking")

In this example, MobState is an enum sum type and each of its variants may have any number of fields (including none at all). The type, its variants, and its fields are all explicitly named. This is very flexible though a bit verbose. The { .. } syntax is used to reference fields and bring them into scope, much like when matching against arrays and dictionaries in GDScript today. Reading and writing fields requires pattern matching, which is critical for consistency and correctness.

This of course composes well with product types. Combining this kind of sum type with inner classes (or even proposed structs) could look something like this:

class Mob:
    var state: MobState
    var hp: int:
        set(value):
            hp = max(0, value)

Here, the Mob type structurally expresses that a mob is composed of { HP and { nothing or { target and power } } }.

Anonymous sum types can support this kind of construction and pattern matching syntax too, but it may require some specific knowledge of the types involved. In particular, the syntax of initialization and pattern matching of anonymous variants of the same type can get quite complex and/or bespoke, such as in types like T1 | T1 | T1. Note that sum types with variants that contain the same types of data are useful and worth supporting! However, such sum types are unwieldy when the variants are not named. There are various ways to support this, but I think punting on anonymous sum types, at least at first, may be a good choice.

Thanks for all the thought and discussion! I'm excited about something like this making it into GDScript at some point.

@unfallible
Copy link

I would like to make a case for something similar to anonymous sum types, which I'll call "algebraic type hints". I would prefer these because I think explicit sum types, which I'm just going to call "tagged unions", would integrate poorly with Godot's gradual typing system and require a generic system to use properly. On the other hand, algebraic type hints would allow users to implement their own tagged unions if they want.

Algebraic Type Hints

I agree with @chanon and others that GDScript would be massively improved by allowing users safely define variables with the form T1 | T2. I also like @julian-a-avar-c's suggestion of allowing more complex typehints with an & operator, although I think & types would probably be more useful for expressing internal engine logic than game logic.

I'm wondering if we can accomplish this by taking advantage of the fact that all data is passed around the engine through a Variant type, which basically is a tagged union that happens to be capable of holding data of any type. My thought is that we could leverage this feature by drawing a distinction between what I'm going to call a Type and a TypeHint. A Type is a property of a value at runtime. Every value always has exactly one Type, which is queried via the is operator. However, because values do not exist at compile time, variables, function return types, signal parameters, etc. cannot be said to be compiled with a Type. Instead, they are declared with type hints, which constrain the possible Type of value which that variable, function return type, signal parameter, etc. can hold. A slightly simplistic way of thinking of the relationship between the Type and TypeHint classes is that a Type defines the set of properties, methods, signals, etc. that a value has, while a TypeHint describes a set of Types that a variable may have. Here's some pseudo code expressing the relationship between the two interfaces. The comments about traits are referring to #6416:

class Type {
	// Classes and Traits are both types. The "parents" variable represents the Type's parent class and any implemented traits. 
	Set[Type] parents;

	bool is(Type other) {
		// An instances of this type is an instance of the other type iff it implements every interface the other type implements
		if (other == this) {
			return true;
		}
		for (TypeHint parent : parents) {
			if (parent.is(other)) {
				return true;
			}
		}
		return false;
	}
}

abstract class TypeHint {
	// Determine whether variable with this type hint can hold valeu
	abstract bool can_hold(Type t);
	abstract bool is_assignable_to(TypeHint other);
}

class BasicTypeHint : TypeHint {
	Type this_type;
	bool can_hold(Type t) {
		return t.is(this_type);
	}
	
	bool is_assignable_to(TypeHint other) {
		return other.can_hold(this_type);
	}
}

My knowledge of Godot's source code isn't very strong yet, but based on my understanding of the GDScript module's source code, it looks like my Type class roughly corresponds to the GDScriptParser::DataType struct. My Type::is() pseudo-code looks like it is describing roughly the same behavior as the GDScriptAnalyzer::check_type_compatibility() function. The TypeHint class has no obvious equivalent in the source, and my guess is that a major hurdle for implementing an algebraic type hint system would be changing the GDScriptParser::Node class to associate GDScriptParser::Node instances with a TypeHint instead of a DataType.

Anyway, the BasicTypeHint class is not particularly interesting by itself, since it's just replicating existing functionality. However, the hope would be that such a class could be used as a base case for other TypeHint classes. These classes would be composable, allowing the expression of more complex type hints if necessary. The other hope is that this composability would make these type hints easy to parse and feed to the analyzer. Here's pseudo-code expressing what the basic logic for or and & type hints might look like:

// OrTypeHint is used to evaluate type hints of the form "T1 | T2"
class OrTypeHint : TypeHint {
	Set[TypeHint] args;
	// OrTypeHint A can hold values of Type B iff one of A's args can hold values of Type B
	bool can_hold(Type t) {
		for (TypeHint arg: args) {
			if (arg.can_hold(t)) {
				return true;
			}
		}
		return false;
	}
	
	// OrTypeHint A is assignable to TypeHint B iff each of A's args is assignable to B
	bool is_assignable_to(TypeHint other) {
		for (TypeHint arg: args) {
			if (!arg.is_assignable_to(other)) {
				return false;
			}
		}
		return true;
	}
}

// An AndTypeHint could be used to evaluate type hints of the form "T1 & T2" It would be similar to OrTypeHint, but:
// - AndTypeHint A can hold values of Type B iff each of A's args can hold values of Type B
// - AndTypeHint A is assignable to TypeHint B iff one of A's args is assignable to B

The TypeHint class's is_assignable_to() and the composability of the TypeHint implementations would provide the foundation for other forms of static analysis. For example, if we wanted a special kind of match statement that could guarantee exhaustiveness, here's how we might be able implement its logic:

class MatchAll {
	TypeHint scrutinee;
	List[TypeHint] arms;
	bool is_valid {
		TypeHint arm_union = {};
		for (TypeHint arm : arms) {
			if (arm.is_assignable_to(arm_union)) {
				//return false because this arm is unreachable
				return false;
			}
			arm_union = new OrTypeHint({arm, arm_union});
		}
		//return true if all of the scrutinee's possible types are accounted for by the arms
		return scrutinee.is_assignable_to(arm_union);
	}
}

Gradual Typing

My biggest concern about eventually adding Rust-style tagged unions to GDScript is that I think they would interact poorly with Godot's gradual typing system. GDScript's being gradually typed makes it easy to write quick, dirty code to test some functionality, while giving users the option to make it more stable by adding type info later down the line. It also makes the syntax look simple and friendly for beginning users. Having a tagged union type undermines these benefits by forcing users to worry about whether an object is a union before they can use all of its functionality. Basically, when using the explicit sum types with gradual typing, you can make an assumption about what type of union it is (e.g. Result) but not its variant (e.g. Result.OK). However, the actual functionality of the union is accessed through the union's variant. To appreciate this point, consider the following example, which uses a Result[int, String] sum type:

union Result:
	OK:
		var rval: int
	ERROR:
		var message: String

func do_something(arg: float) -> Result:
	# do something as long as arg isn't NAN or INF, in which case return an error

Now every time I want to use the return value from do_something(), I have use a match statement. Suppose I want to do_something() three times and sum the result without worrying about type safety. Intuitively, it seems like I should be able to just write return do_something(1.0)+do_something(2.0)+do_something(3.0), but this is complicated by the fact that + isn't defined for the Result type. So instead, we end up with something awful like this:

var a = do_something(1.0)
var b = do_something(2.0)
var c = do_something(3.0)
if a is_match Result.OK(var x):
	if do_something(2.0) is_match Result.OK(var y):
		if do_something(3.0) is_match Result.OK(var z):
			return x+y+z
#don't care what happens here

Remember that because we are assuming that a, b, and c are all Result objects, the above code is technically the quick and dirty version! That's a lot of work to deal with an edge case I know I'm not going to trigger, and when first learning the engine or crunching during a game jam, wanting to skip such formalities is a totally valid choice that I think Godot should continue to support (if we reject this premise, then I think we should ask the larger question of whether gradual typing is really an appropriate choice for GDScript in the first place).

Also note the use of a brand new is_match operator, which we needed to create since the is operator is reserved for testing whether do_something() returned a Result type value.

Anyway, here's the alternative:

func do_something(arg: float) -> int | string:
	# do something as long as arg isn't NAN or INF, in which case return an error

var a = do_something(1.0)
var b = do_something(2.0)
var c = do_something(3.0)
return  a+b+c

If we want to add type safety down the line, we can do that:

var a: int|String = do_something(1.0)
var b: int|String = do_something(2.0)
var c: int|String = do_something(3.0)
# the static analyzer should flag "return a+b+c" as an issue since strings and ints can't be added
if (a is int) and (b is int) and (c is int): 
	return a+b+c
else:
	# don't care what happens here

Note that with algebraic type hints, we don't need any special operator for checking discriminators, since Godot's Variant type provides a discriminator for free, and the is operator already exists to check it (I'm simplifying here; I understand that checking script types via is is more complex than that, but even that process still starts by checking the variant type).

Someone still committed to using tagged unions might argue that in order to reduce the amount of boilerplate, we should add implicit casting rules. For example, if tagged unions are only allowed to hold one data per union variant, we could allow unsafe casts to any of it's variant's types (e.g. Result[int, string] could be cast to int or string unsafely and mark the cast as unsafe). This would make casting from a union type to a nonunion type relatively straightforward to write.

However, moving data into unions would still be tedious. One of the main advantages tagged unions allow over algebraic type hints is that they would allow distinguishing between different cases of the same type (e.g. Result[string,string]). However, this means that we can't easily cast a string to a Result[string,string] because there wouldn't be a good way to know whether we intended to construct a Success or an Error. This is still a problem for algebraic type hints, but to a lesser extent. For example, if we wanted to recreate the Result[String,String] union using algebraic type hints, we could technically do it as

struct Result:
	struct Success:
		var message: String
	struct Error:
		var message: String
	var result: Success | Error
	
func foo(a: Result) -> void

However, I suspect that more often, it will make more sense to simply define the Error struct and then define func foo(x: String|Error) -> void. To appreciate why func foo(String|Error) -> void might sometimes be preferred to func foo(a: Result) -> void, imagine declaring a signal signal int_signal(x: String). Since String is assignable to String | Error, we can safely connect foo(x: String|Error) directly to int_signal. On the other hand, to connect foo(Result) to int_signal, we'll need a lambda or something similar. So int_signal.connect(foo) will become int_signal.connect(func(x): foo(Result.Success.new(x))). My suspicion is that something similarly cumbersome would need to happen if Result were expressed with tagged unions instead of algebraic typehints. Being forced to use add connections indirectly via lambdas would be tedious to do in script and likely even more so from the editor (if it was possible at all). We might be able to get minimize this problem by adding even more implicit casting rules (for example, we might say that if a union's variant types can never overlap, you can safely cast to that union from any type), but my sense is that achieving the kind of safety and ergonomics which algebraic type hints would offer would require a convoluted set of rules (what happens when trying to assign a Result[T,E] to an Option[T]? What about nested union types such as Result[Option[T]]?) and/or eat away at the performance benefits gained by using an int to discriminate between the types instead of the more complex logic associated with querying ClassDB. I'd be happy to be proven wrong here though.

Tagged Unions and Generics

A short term concern about using tagged unions instead of algebraic type hints is that GDScript currently lacks generic types. In my opinion, this is a significant limitation with the language's static typing system which should be addressed at some point, but that's outside the scope of this issue. Nevertheless, I bring generics up here because a some of the most salient use cases for tagged unions, including the Option and Result types, would actually require both Union types and generic types in order to be really useful. Sticking with the Option example, if tagged unions were added without generics, then either we would need to declare a different Option type for every for every type we would like to pass around (e.g. IntOption, StringOption, NodeOption, etc.) or Option would need to return either None or Some(Variant). The former case would be difficult to maintain and difficult to use. The latter case would succeed in preventing people from accidentally reading null pointers, but only at the expense of type safety. Personally, I don't think either of those tradeoffs are worth it. On the other hand, because anonymous sum types are declared where they are used, they are always type safe while still giving the static analyzer the information necessary to ensure correct usage.

Again, my point is not that GDScript should never have generics. I think it should and I agree with @julian-a-avar-c that algebraic types and generic types would be complementary (in fact, by relaxing the assumption that GDScript type hints correspond one-to-one with a specific types, we may even carve out space for Java-style generic wildcards, which may be necessary for creating ergonomic, type-safe signals). As things stand now though, GDScript doesn't have generic types, and adding tagged unions without generic types would be underwhelming.

Anonymous Sum Types can be used to implement Explicit Sum types

Even though I think adding tagged unions to the engine's main APIs would make them unnecessarily tedious to work with, I do think that they have legitimate uses which the algebraic type hints don't cover directly. The most notable example, pointed out by several people including @olson-sean-k, is the case of T1 | T1 | T1. However, while this is not directly expressible with anonymous sum types, it can be indirectly expressed using a combination of anonymous sum types and product types (e.g. classes and structs). Here's what it would look like to write the MobState example in terms of the proposed struct feature:

struct MobState:
	struct IDLE:
		pass
	struct ATTACKING:
		var target: Node2D
		var power: int
	var state: IDLE | ATTACKING

match MobState.state:
	is MobState.ATTACKING var state:
		print("changing mob target")
		state.target = %Player2
	is MobState.IDLE:
		print("mob isn't attacking")

I'd like to call attention to a couple of salient details about the MobState struct.

  1. Please note that this only requires one extra line of code compared to the original example (the line where we declare that the MobState struct has a state variable whose type is either IDLE or ATTACKING.) The state variable is how we get around the problem of determining which discriminator we want to query via is. If you want to know if a value is a MobState, you test the variable, and if you want to know if a MobState is IDLE you test the state. This is less performant, since is is more expensive than checking an enum, and it's a little less ergonomic since we have to explicitly state that we're accessing the state.
  2. Also please note I have introduced a new form of pattern matching here (pattern matching a variable's type is currently impossible). However, unlike with explicit sum types, introducing a special operator for testing unions isn't necessary since we're using the type of state as the discriminator for the union, and the is operator already exists to query that. Also note that the new form of pattern matching is useful for more than just simulating rust enums. In fact, I have lifted the pattern matching syntax here from the comments on proposal GDScript match should work for type matching #3733, which was a discussion of how to match on types for other purposes such as input handling routines and collision detection. In other words, adding special match operators is unnecessary with anonymous sum types, but if we did expand on GDScript's existing pattern matching functionality, those changes would be useful a variety of common scenarios, whereas explicit sum types would require the addition of single-purpose operators.
  3. Finally, notice that the MobState.IDLE struct has no members. That doesn't mean it has no data though, because its type is data. If you wanted to define other states with no members, you would just need to define a new struct for each and add it to the state variable's type hint.

On the question of nullability

@chanon's original post was concerned with Godot's lack of nullable types. In this thread, there have been a variety of proposed solutions to this. I agree with the people who have suggested treating nullable and sum types as a unified concept, albeit not by adding an Option type. For the purposes of this discussion, I'm going to make a distinction between Nil, which I'll use to refer to a specific type which has no members, and null, which I'll use to mean a Nil-type value.

@Bromeon has voiced two arguments against treating union and nullable types as a unified concept:

Also, I think nullable types are completely independent of unions and occur much more frequently. They also deserve a special syntax T?, because T | null is just verbose and doesn't add anything. A null type on its own can only ever have one value, so carries no information -- there's thus no need to make it a static type and allow parameter: null. There's already void to signal "no return value".

I agree that nullable types occur more frequently than other union types and therefore deserve a special syntax T?. However, I would still like to see nullable types handled by the algebraic type hint system. To this end, I would favor adding T? as syntactic sugar for T | Nil. This would be less flexible than an Option type, insofar as it would result in int? | bool?, int? | bool, and int | bool | Nil all being treated as equivalent, but I can't think of a case where distinguishing between int?'s null and bool's null would be desirable (and if such behavior ever was desirable outside of the engine's APIs, developers can always roll their own Option using the strategy I described above). If anything, treating these as equivalent strikes me as desirable.

I'm not sure I agree with @Bromeon's second concern, which is that there's no need to add a Nil type because there's never a case where you would want a variable which could only ever hold null. Consider the case of inheritance. Imagine a class AbstractFoo with a method stub func bar(int x) -> float?. The intent is that implementations of bar() will return either a float or nothing at all. However, the specifics are undecided. However, suppose I have a child implementation whose bar() implementation will always return null. In that case, I actually would want to declare the child class's method func bar(int x) -> Nil (this would be a perfectly legal example of return type covariance).

@dalexeev
Copy link
Member

@unfallible I also think that first-class types would provide a more universal and reliable solution to many of the problems with Godot's type system, including nested types (like Array[Array[int]]), union types, and generics.

See also my gist. Instead of inheritance, we could use type parameters. That is, the union type A|B|C can be represented as Union[A, B, C]. (For better performance in the case of only built-in Variant types, we could use a bitmask cache.)

As for property info hint string, I think we could introduce the FQTN (Fully Qualified Type Name) string format instead. The Type class (I call it GodotType) could have conversion methods: to_property_info(), to_fqtn(), from_fqtn(), etc.

I plan to write a separate proposal about the unified type system with a description of the current state, problems and a possible solution as I see it.

@unfallible
Copy link

unfallible commented Mar 24, 2024

@dalexeev Thank you for bringing your GodotType proof of concept to my attention; I'm looking forward to studying it closely. I agree with you that first class types would be a great addition to the engine, but the first class types you're talking about sound distinct from but complementary to the "algebraic type hints" I was describing (and I'm sorry if my terminology here is confused; I haven't studied this stuff since undergrad).

I want to clarify that what I was calling a TypeHint was more than just a parser node for interpreting the "property info hint string," although my choice of names seems to have been unhelpful. The reason I was advocating for "algebraic type hints" and not "algebraic data types" is that I was envisioning the most complex logic happening at compile time, not run time. Although I didn't say this explicitly, one hope I have (although people who know the engine better might see a reason why this is unrealistic) is to implement this TypeHint class (which again is not just the property info hint string) in a way that has minimal impact outside of the GDScript module. The bulk of my post was intended to address the question of how such a TypeHint class would work, and why I believed it would be more ergonomic than the Rust-style tagged unions favored by commenters like @olson-sean-k. The Type class in my post was just a placeholder to explain the logic of how a TypeHint would check whether one TypeHint could be substituted for ("assigned to") another; however I imagined it could be implemented as a new GodotType class or just leverage the existing GDScriptParser::DataType struct.

I was envisioning the compiler doing something like the following:

  1. The property info hint string would be used to construct a Type object.
  2. The Type objects would be used to construct the BasicTypeHint objects.
  3. The OrTypeHint and AndTypeHint objects would be composed of other TypeHint objects (all the way down to the BasicTypeHint).
  4. The analyzer would use the TypeHint objects to guarantee that all assignments, function calls, etc. are safe.
  5. The compiler would use the TypeHint objects to look up different method dispatches, etc. and then discard the TypeHint objects. All of our assignments are guaranteed to be safe, so there's need to track this info at runtime.

Here's an imperfect analogy for thinking about what I meant by TypeHint and how it relates to Type. Algebraic type hints would be to first class types as enum is to int. When you declare a variable to be an enum, you tell the compiler that the variable can only hold a specific subset of the possible int values; likewise, when you declare a variable with a TypeHint, you tell the compiler that the variable's Type can only belong to a specific subset of the possible Type values. When an enum variable is compiled, the compiled code has no knowledge of what subset of the possible int values that variable could hold; likewise, when code with a TypeHint is compiled, the compiled code forgets what subset of the possible Type values that variable could hold.

My concerns were primarily directed at the ergonomics of Rust-style enums in GDScript, particularly when writing untyped code. Again, I haven't gotten the chance to properly dig into the code in your gist yet, so I my concerns would be applicable to your solution, but I do want to make sure people understand what I was suggesting.

Update:

I've had a chance to study your GodotType code now, and now I'm wondering if you and I are focusing on different questions. This may not have been been clear enough in my first post, but I was more concerned with how GDScript sum types should behave than how they're implemented. I don't care whether this kind of type system is implemented using a TypeHint class and GDScriptParser::DataType, a TypeHint class and a GodotType class, or just a GodotType class.

I kept emphasizing that TypeHint and Type are different and that TypeHint should be "forgotten" at runtime, but reading your gist made me realize that what I actually care about is that the union type should not be instantiable. If sum and product types were just treated as abstract supertypes of their variants, that would still produce the behaviors I'm advocating for.

What I do care about is how sum types should behave. In Rust (and maybe Swift? I've never done anything in Swift, but a cursory google search suggests it works the same way), if you want to assign a value to an enum, you must explicitly specify the enum variant you're assigning. For example: let test: Option<int> = Some(7);. Also, whenever you use a variant's value, Rust forces you to check the discriminator to ensure that the enum actually contains the expected variant.

My main contention is that both of these demands are too onerous for GDScript. GDScript is supposed to let developers mostly ignore the type system in order to make the language easier to learn and and develop in. Since Rust enums force you to interact with the type system to perform simple actions such as assignments, they would be a bad fit for the language. In my view, the following should be type safe in GDScript:

var test_var: Node2D|Node3D = $SomeNode2D
var test_var2: Node2D|Node3D|Nil = test_var
var child_list: Array[Node] = test_var.get_children()

Maybe someone knows something I don't, but I don't see a way to recreate this syntax without implementing a convoluted set of rules for implicitly casting into tagged union variants. I'm interested to hear others views on all this though.

@deniszholob
Copy link

deniszholob commented Apr 3, 2024

For me this would be more beneficial for type safety when handling nulls

simplified example of possible null values from inputs or array indexing but null vs Object return types in my experience coming from using Typescript are very common, and would be great to safeguard against null runtime errors better

@export var item_data: ItemData | null # Can be a null here since the user can leave this out in the node tree (weather accidentally or on purpose)
func grab_slot_data(index: int) -> SlotData | null:
	var slot_data: SlotData | null = slot_datas[index]
	return slot_data

Unless im missing another way to mark nullable values, in that case plz let me know thanks :)

@julian-a-avar-c
Copy link

@deniszholob That is under work, iirc it's under review? Issue #162 & godotengine/godot#76843. The approach that is being taken right now is nullable types as its own implementation separate from union types. As far as I can see, we are moving away from implicit nulls 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Discussion
Development

No branches or pull requests