Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow parsing of getHeaderElementTMY3 #950

Closed
Mathadon opened this issue Jun 14, 2018 · 6 comments
Closed

slow parsing of getHeaderElementTMY3 #950

Mathadon opened this issue Jun 14, 2018 · 6 comments
Assignees
Labels

Comments

@Mathadon
Copy link
Member

When a TMY3 file does not contain the correct header, it searches through the entire file for the header. On my PC on linux this takes 10 minutes for a file containing 8800 lines. This process happens during translation, due to which it seems as if dymola has stalled.

The problem seems that the implementation of Modelica.Utilities.Strings.find is quite inefficient, which is used in the for loop:

  while (not EOF) and (index == 0) loop
    iLin:=iLin + 1;
    (lin, EOF) :=Modelica.Utilities.Streams.readLine(fileName=filNam,
      lineNumber=iLin);
    index :=Modelica.Utilities.Strings.find(
      string=lin,
      searchString=start,
      startIndex=1,
      caseSensitive=false);
  end while;

See also lbl-srg/BuildingsPy#203.

This issue could possibly be resolved by only checking for the keyword start at the start of the line, e.g. by using

     while (not EOF) and (index == 0) loop
       iLin:=iLin + 1;
       (lin, EOF) :=Modelica.Utilities.Streams.readLine(fileName=filNam,
         lineNumber=iLin);
          if Modelica.Utilities.Strings.length(lin)>=startLen then
            if Modelica.Utilities.Strings.isEqual(
               Modelica.Utilities.Strings.substring(lin,1,startLen),
               start,
               caseSensitive=false) then
                 index:=1;
            else
              index:=0;
            end if;
          else
            index:=0;
          end if;
     end while;

do we want to change this implementation?

@Mathadon
Copy link
Member Author

@Mathadon to implement this and clearly mention that the line should start with the tag

Mathadon added a commit that referenced this issue Jun 15, 2018
@Mathadon
Copy link
Member Author

Mathadon commented Jun 15, 2018

There is an implementation in https://github.com/ibpsa/modelica-ibpsa/tree/issue950_tmyheader . However, the code produces weird errors when the tag is not found.

E.g. when running IBPSA.BoundaryConditions.WeatherData.Examples.ReaderTMY3 when first adjusting IBPSA/Resources/weatherdata/USA_IL_Chicago-OHare.Intl.AP.725300_TMY3.mos such that the third line (#LOCATION,Chicago Ohare Intl Ap,IL,USA,TMY3,725300,41.98,-87.92,-6.0,201.0) is removed:

Error when reading line 733 from file
"IBPSA/Resources/weatherdata/USA_IL_Chicago-OHare.Intl.AP.725300_TMY3.mos":
No such file or directory
The stack of functions is:
IBPSA.BoundaryConditions.WeatherData.BaseClasses.getHeaderElementTMY3

I tried changing the file, the contents of line 733 does not matter. All previous lines are read correctly. The number of characters before line 733 does seem to matter. The issue only appears when the file has 92533 characters.

When the number of characters is less, the following error is produced:

stringmark+len+1<Endsimplestring
The following error was detected at time: 0
Room to allocate string
The stack of functions is:
IBPSA.BoundaryConditions.WeatherData.BaseClasses.getHeaderElementTMY3

until the number of lines is <=730 (~92290 characters), then the error correctly becomes:

 not EOF
The following error was detected at time: 0
Error: Did not find keyword '#LOCATION' at the start of a line in the weather file.
   Check for correct weather file syntax.
The stack of functions is:
IBPSA.BoundaryConditions.WeatherData.BaseClasses.getHeaderElementTMY3

The funny errors are produced while calling Modelica.Utilities.Streams.readLine. This calls from ModelicaInternal.c:

MODELICA_EXPORT const char* ModelicaInternal_readLine(const char* fileName, int lineNumber, int* endOfFile) {
    /* Read line lineNumber from file fileName */
    FILE* fp = ModelicaStreams_openFileForReading(fileName, lineNumber - 1);
    char* line;
    int c, c2;
    size_t lineLen;
    long offset;
    char localbuf[200]; /* To avoid fseek */

    if (feof(fp)) {
        goto END_OF_FILE;
    }

    /* Determine length of line lineNumber */
    offset  = ftell(fp);
    lineLen = 0;
    c = fgetc(fp);
    c2 = c;
    while ( c != '\n' && c != EOF ) {
        if (lineLen < sizeof(localbuf)) {
            localbuf[lineLen] = (char)c;
        }
        lineLen++;
        c2 = c;
        c = fgetc(fp);
    }
    if ( lineLen == 0 && c == EOF ) {
        goto END_OF_FILE;
    }

    /* Read line lineNumber */
    if ( lineLen > 0 && c2 == '\r') {
        lineLen--;
    }
    line = ModelicaAllocateStringWithErrorReturn(lineLen);
    if ( line == NULL ) {
        goto Modelica_ERROR3;
    }

    if (lineLen <= sizeof(localbuf)) {
        memcpy(line, localbuf, lineLen);
    }
    else {
        if ( fseek(fp, offset, SEEK_SET) != 0 ) {
            goto Modelica_ERROR3;
        }
        if ( fread(line, sizeof(char), lineLen, fp) != lineLen ) {
            goto Modelica_ERROR3;
        }
        fgetc(fp); /* Read the EOF/new-line. */
    }
    CacheFileForReading(fp, fileName, lineNumber);
    line[lineLen] = '\0';
    *endOfFile = 0;
    return line;
    /* End-of-File or error */
END_OF_FILE:
    fclose(fp);
    CloseCachedFile(fileName);
    *endOfFile = 1;
    line = ModelicaAllocateString(0);
    return line;

Modelica_ERROR3:
    fclose(fp);
    CloseCachedFile(fileName);
    ModelicaFormatError("Error when reading line %i from file\n\"%s\":\n%s",
        lineNumber, fileName, strerror(errno));
    return "";
}

The error Error when reading line is produced when calling ModelicaAllocateStringWithErrorReturn for which the MSL does not contain a definition. This definition seems tool-specific and is contained by matrixop.c:

DYMOLA_STATIC char* ModelicaAllocateStringWithErrorReturn(size_t len) {
        EnsureMarkInitialized();
        if (stringmark+len+1<Endsimplestring)
                return ModelicaAllocateString(len);
        else return 0;
}

the error disappears when increasing the buffer size char simplestring[100000]; in matrixop.h.

Conclusion: this is a bug in Dymola. The least they could do is produce a clear error message when the buffer overflows. I'll report this.

In case someone wants to reproduce this: make sure to compile IBPSA.BoundaryConditions.WeatherData.BaseClasses.getHeaderElementTMY3 separately before calling an example that uses the TMY reader. Dymola seems to cache the compiled functions such that you have to compile it manually to make sure that the code is updated.

@Mathadon Mathadon added the external This issue requires an external fix/update label Jun 15, 2018
@Mathadon
Copy link
Member Author

I added the label 'external' to indicate that we better wait on an external bugfix before merging this. Otherwise users will get these obscure warnings.

@Mathadon
Copy link
Member Author

The answer of the developers is essentially that this is a known limitation and they are not going to change it since it is an easy way of avoiding memory overflows.

I got the suggestion to use a custom C implementation to avoid this behavior. I'll try to make some time for this. Or do we want to stick to the MSL implementation?

@Mathadon Mathadon removed the external This issue requires an external fix/update label Jun 26, 2018
@mwetter
Copy link
Contributor

mwetter commented Jun 26, 2018 via email

@mwetter
Copy link
Contributor

mwetter commented Apr 12, 2021

I will close this for now due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants