Bug in generating objects in union of types. #53

LorrinWWW · 2024-01-06T09:57:10Z

Here is an minimal example:

schemas = {
 'title': 'Functions',
 'type': 'array',
 'items': {'anyOf': [{'$ref': '#/definitions/apple'},
   {'$ref': '#/definitions/banana'}]},
 'definitions': {
     'apple': {
         'type': 'object',
         'properties': {'name': {'enum': ['apple'], 'type': 'string'}},
         'required': ['name']
     },
     'banana': {
         'type': 'object',
         'properties': {'name': {'enum': ['apple'], 'type': 'string'}},
         'required': ['name']
     }
 }
}

parser = JsonSchemaParser(schemas)
prefix_function = build_transformers_prefix_allowed_tokens_fn(pipe.tokenizer, parser)

prompt = """Please generate a list of fruits."""

# Call the pipeline with the prefix function
output_dict = pipe(prompt, max_new_tokens=100, prefix_allowed_tokens_fn=prefix_function)

It generates:

Please generate a list of fruits.











[
  {
    "name" 
 
 
 
 
 
name": "apple"
 
 
 
 
 
 "apple"
 
 
 
 
 
 },
  {
    "name" 

 
 
 
 
 name": "apple"

which does not look like a valid JSON.

The text was updated successfully, but these errors were encountered:

LorrinWWW · 2024-01-06T09:59:09Z

@noamgat Any advice on walking around this issue? Thank you so much!

LorrinWWW · 2024-01-06T10:17:16Z

Should be irrelevant to array. Here is another example:

schemas = {
 'title': 'Functions',
 'type': 'array',
 'items': {'anyOf': [{'$ref': '#/definitions/apple'},
   {'$ref': '#/definitions/banana'}]},
 'definitions': {
     'apple': {
         'type': 'object',
         'properties': {'name': {'enum': ['apple'], 'type': 'string'}},
         'required': ['name']
     },
     'banana': {
         'type': 'object',
         'properties': {'name': {'enum': ['apple'], 'type': 'string'}},
         'required': ['name']
     }
 }
}

from typing import List, Union
from pydantic import BaseModel
import json


class Apple(BaseModel):
    name: str
    size: int

class Banana(BaseModel):
    name: str
    number: int

class Fruit(BaseModel):
    fruit: Apple|Banana


parser = JsonSchemaParser(Fruit.schema())
prefix_function = build_transformers_prefix_allowed_tokens_fn(pipe.tokenizer, parser)

# Call the pipeline with the prefix function
output_dict = pipe("Please generate a fruit:", max_new_tokens=100, prefix_allowed_tokens_fn=prefix_function)

generates:

Please generate a fruit:











{
  "fruit": {
    "name"
 
 
 
 
 
 name": "apple"
 
 
 
 
 
 "},  "
 
 
 
 
 
,"number"
 
 
 
 
 
 size

with a error log:

ERROR:root:Unknown LMFormatEnforcer Problem. Prefix: '











{
  "fruit": {
    "name"
 
 
 
 
 
 name": "apple"
 
 
 
 
 
 "},  "
 
 
 
 
 
 ,"number"
 
 
 
 
 
 size'
Terminating the parser. Please open an issue at 
https://github.com/noamgat/lm-format-enforcer/issues with the prefix and CharacterLevelParser parameters
Traceback (most recent call last):
  File "/home/[email protected]/miniconda3/envs/nebula-fav2/lib/python3.10/site-packages/lmformatenforcer/tokenenforcer.py", line 81, in _compute_allowed_tokens
    self._collect_allowed_tokens(state.parser, self.tokenizer_tree.root, allowed_tokens, shortcut_key)
  File "/home/[email protected]/miniconda3/envs/nebula-fav2/lib/python3.10/site-packages/lmformatenforcer/tokenenforcer.py", line 122, in _collect_allowed_tokens
    self._collect_allowed_tokens(next_parser, next_tree_node, allowed_tokens, None)
  File "/home/[email protected]/miniconda3/envs/nebula-fav2/lib/python3.10/site-packages/lmformatenforcer/tokenenforcer.py", line 120, in _collect_allowed_tokens
    next_parser = parser.add_character(character )
  File "/home/[email protected]/miniconda3/envs/nebula-fav2/lib/python3.10/site-packages/lmformatenforcer/jsonschemaparser.py", line 69, in add_character
    updated_parser.object_stack[receiving_idx] = updated_parser.object_stack[receiving_idx].add_character(new_character)
  File "/home/[email protected]/miniconda3/envs/nebula-fav2/lib/python3.10/site-packages/lmformatenforcer/characterlevelparser.py", line 89, in add_character
    next_parsers = [parser.add_character(new_character) for parser in relevant_parsers]
  File "/home/[email protected]/miniconda3/envs/nebula-fav2/lib/python3.10/site-packages/lmformatenforcer/characterlevelparser.py", line 89, in <listcomp>
    next_parsers = [parser.add_character(new_character) for parser in relevant_parsers]
  File "/home/[email protected]/miniconda3/envs/nebula-fav2/lib/python3.10/site-packages/lmformatenforcer/jsonschemaparser.py", line 270, in add_character
    value_schema = self.schema_object.properties[self.current_key]
KeyError: 'size'

LorrinWWW · 2024-01-06T17:53:39Z

It seems that, when updating a union char parser, it will append multiple key string parsers for its child parsers (https://github.com/noamgat/lm-format-enforcer/blob/1ee9d39de93e630e2cd5c041ebac3c8352853296/lmformatenforcer/jsonschemaparser.py#L279C39-L279C39).
So there will be multiple key string parser in the object_stack. And thus, those key string parsers are trigger multiple times, causing multiple keys generated, eventually messing up the parser states.

@noamgat Do you think it's a fundamental issue or not?

noamgat · 2024-01-07T11:57:13Z

Thanks for the report! I was able to create a failing unit test based on it, and will investigate and solve.

noamgat · 2024-01-07T13:32:38Z

I pushed a fix to the problem. Can you test the branch via

pip install git+https://github.com/noamgat/lm-format-enforcer.git@feature/union_typed_arrays

And check if you get better results?

LorrinWWW · 2024-01-07T19:38:08Z

@noamgat Awesome! It works well on my side. Thank you so much for your quick response!

Solves #53 * Added failing test based on issue #53 * Improved support for union typed objects in arrays

noamgat · 2024-01-10T18:45:08Z

Released in 0.8.2

LorrinWWW changed the title ~~Bug in generating array of objects in two or more types.~~ Bug in generating objects in union of types. Jan 6, 2024

noamgat added a commit that referenced this issue Jan 7, 2024

Added failing test based on issue #53

c1bb065

noamgat added the bug Something isn't working label Jan 7, 2024

noamgat mentioned this issue Jan 10, 2024

Feature/union typed arrays #58

Merged

noamgat added a commit that referenced this issue Jan 10, 2024

Feature/union typed arrays (#58)

48d2c4d

Solves #53 * Added failing test based on issue #53 * Improved support for union typed objects in arrays

noamgat closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in generating objects in union of types. #53

Bug in generating objects in union of types. #53

LorrinWWW commented Jan 6, 2024

LorrinWWW commented Jan 6, 2024

LorrinWWW commented Jan 6, 2024 •

edited

Loading

LorrinWWW commented Jan 6, 2024

noamgat commented Jan 7, 2024 •

edited

Loading

noamgat commented Jan 7, 2024

LorrinWWW commented Jan 7, 2024

noamgat commented Jan 10, 2024

Bug in generating objects in union of types. #53

Bug in generating objects in union of types. #53

Comments

LorrinWWW commented Jan 6, 2024

LorrinWWW commented Jan 6, 2024

LorrinWWW commented Jan 6, 2024 • edited Loading

LorrinWWW commented Jan 6, 2024

noamgat commented Jan 7, 2024 • edited Loading

noamgat commented Jan 7, 2024

LorrinWWW commented Jan 7, 2024

noamgat commented Jan 10, 2024

LorrinWWW commented Jan 6, 2024 •

edited

Loading

noamgat commented Jan 7, 2024 •

edited

Loading