Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-compiled project works not reliable #29

Closed
lakeroe opened this issue Dec 12, 2017 · 42 comments
Closed

Self-compiled project works not reliable #29

lakeroe opened this issue Dec 12, 2017 · 42 comments

Comments

@lakeroe
Copy link

lakeroe commented Dec 12, 2017

Hello USBCNC,

I flashed an older hex file you provided here on a blue pill STM32F103C8T6 board, installed ST's VCP driver and did a first test.
I streamed a pretty big g-code file (860.000 lines, running time ~2hours) using bCNC -> this works without a problem.

However, the same test using a self-compiled file only works for a few minutes but after some random time (between 5 and 15 minutes) it hangs.

I already tried different compiler options and optimization levels but can't get it to work reliable.
A next test is to use an older compiler. I currently use 6.3.1-2017q2 and I think you used 4.9-2015q3 as I saw here.

Do you have any further ideas ?

I'm interested in a stable and reliable GRBL STM32 port, so any help is highly appreciated.

Thanks and best regards,
lakeroe

@lakeroe
Copy link
Author

lakeroe commented Dec 13, 2017

Hello together,

after some more testing I finally got it to work.
I compiled the project using different compilers and 100% same settings.

  1. GCC V6.3.1 20170620
    Binary size: 38960 Bytes -> NOT working

  2. GCC V4.9.3 20150529
    Binary size: 38720 Bytes -> working

It would be interesting to find the root of the problem.
My best guess is some compiler optimization changes.

Let me know if you're interested in further investigation, I'm willing to help if you like !

Best regards,
lakeroe

makefile.bat

@usbcnc
Copy link
Owner

usbcnc commented Jan 2, 2018

I am working on another project of STM32. I think compile option does makes difference for USB performance. In the mean time there is known issue regarding the resistor used on the board (should be 1.5K not 10k). For me it works better if I hook up with a USB 2.0 hub.

@usbcnc usbcnc closed this as completed Jan 2, 2018
@radiolee
Copy link

@usbcnc Did you mean the pull resistor between USB D+ and VDD ?

@usbcnc
Copy link
Owner

usbcnc commented Jan 11, 2018

@radiolee Yes. The soldered resistor is 10K. It will work on some PC but not work on the others.

@radiolee
Copy link

radiolee commented Jan 11, 2018

@usbcnc Thanks! My stm32f103c8t6 mini board is on the road. I'll check it when I get it.
BTW, what version of toolchain are you using, still 4.9 2015q3 or later?
I use 2015q3 and build with default configuration:
Program Size:
text data bss dec hex filename
37396 1324 5228 43948 abac stm32grbl.elf

@radiolee
Copy link

@usbcnc I got the board with 1.5k pullup res.
1st run without stepper driver, looks 2x faster with same settings in cncjs. thanks for your great work!

@lakeroe
Copy link
Author

lakeroe commented Jan 13, 2018

I got my board with 10k pullups and changed them to 1.5k but the problem remains the same.
It works reliable using GCC V4.9.3 20150529 and unreliable using GCC V6.3.1 20170620.
Both using exactly the same compilation settings (see my makefile above).

Regards,
lakeroe

@radiolee
Copy link

@lakeroe I'm not sure I saw 2016 or 2017 version @LinuxCNC used, maybe in gnea/grbl ARM discuess, but can't find it now.
Did you try CooCox IDE with latest gcc? I was try cygwin but don't know howto:relieved:

@usbcnc
Copy link
Owner

usbcnc commented Jan 13, 2018

I am not expert of tool chain. I use what works for me for now. If one works and the other does not , I can assume the other one has a bug or so. If you really want, disassemble the elf and find the difference.

@usbcnc
Copy link
Owner

usbcnc commented Jan 13, 2018

I have noticed in the past some compiler will optimize differently the following code.
While(some flag!=true)
;
The code is meant to wait the flag change, the change of flag is in interrupt routine.
There are two places in serial.c that use this kind of code.
If they are improperly optimized, then serial code will not reliablely work.
Suggest to use something like #progma to disable the optimize around those two places.

@usbcnc
Copy link
Owner

usbcnc commented Jan 13, 2018

O3 might work better than optimize for size.

@usbcnc
Copy link
Owner

usbcnc commented Jan 13, 2018

I do have latest coocox installed but not the latest tool chain. I still prefer to use older ide since I get used to it. Do not find need to have to use the latest one.

@radiolee
Copy link

Thanks for ur reply. I'm still use old version

@lakeroe
Copy link
Author

lakeroe commented Jan 14, 2018

There are two places in serial.c that use this kind of code.
If they are improperly optimized, then serial code will not reliablely work.
Suggest to use something like #progma to disable the optimize around those two places.

I don't use the serial interface, but USB instead. So in my understanding this should not be a problem.

O3 might work better than optimize for size.

As I wrote in my first post I already tried different optimization levels without luck ...

@radiolee
Copy link

I guess @usbcnc point out While(some flag!=true) in serial.c maybe differently optimized by latest gcc, and the point is in interrupt routine. If some diff in interrupt, it may makes random issues.

@usbcnc
Copy link
Owner

usbcnc commented Jan 14, 2018

Try to disable compile option and compare the result between the two compilers. I do not have installed the latest tool.

@lakeroe
Copy link
Author

lakeroe commented Jan 14, 2018

I'd love to find the root cause of the problem, but I think my knowledge is not enough for that.
I just compared the result between the two compilers (the .ASM files) and a simple text compare shows hundreds (or thousands) of differences. Maybe there's a better approach to find the differences ?

@usbcnc
Copy link
Owner

usbcnc commented Jan 14, 2018

It should related to the serial port code a(USB). You can try to user real serial port (not USB VCP) to see if both tool works the same.

@lakeroe
Copy link
Author

lakeroe commented Jan 15, 2018

Today I could narrow down the problem and @usbcnc you're right: The problem seems related to the USB VCP code.

Real serial port (using an external USB-RS232-converter from FTDI) works fine.
USB VCP and compiler option -O0 -> works fine
USB VCP and compiler option -O1 -> works fine
USB VCP and compiler option -O2 -> connection unreliable
USB VCP and compiler option -Os -> connection unreliable

So it really seems a compiler optimization problem.

I try to further isolate the problem, but this will take some time.
If you have any further idea, please let me know ...

@usbcnc
Copy link
Owner

usbcnc commented Jan 16, 2018

Have you tried O3?

@lakeroe
Copy link
Author

lakeroe commented Jan 17, 2018

Yes I tried -O3, it doesn't work either.

The problem seems to be the -fschedule-insns2 optimization flag which is automatically enabled at -O2, -O3 and -Os.
By adding -fno-schedule-insns2 in addition to -Os the code works reliable.
I already compared the two generated codes (*.ASM files), and they are quite different.

This leads to the question, what's the root of the problem:
*) a compiler bug ?
*) bad c coding style ?
*) any other ideas ?
*) ...

@usbcnc
Copy link
Owner

usbcnc commented Jan 17, 2018

I would check the code generated for serial.c and usb_endp.c.
Possible reason are

  1. Synchronization.
  2. (This is a bug in all VCP code that a zero byte needs to send to host if multiple of 64 bytes are send to the host and no further IN bytes are send). This however should not be affected by the optimization level.

@usbcnc
Copy link
Owner

usbcnc commented Jan 17, 2018

main.c has no issue. As I said in serial.c and usb_endp.c

@lakeroe
Copy link
Author

lakeroe commented Jan 17, 2018

Sorry, I noticed that myself and here are the correct files again:

serial-not-okay.asm
serial-ok.asm
usb_endp-not-okay.asm
usb_endp-ok.asm

@usbcnc
Copy link
Owner

usbcnc commented Jan 17, 2018

I think there is a bug in the compiler.
Look at serial-not-okay.asm
1a: 70d3 strb r3, [r2, #3]
1c: 5460 strb r0, [r4, r1]

Wrong order. Should swap the lines.
But in the ok asm
1a: 5460 strb r0, [r4, r1]
1c: 70d3 strb r3, [r2, #3]

The c code is
serial_tx_buffer[serial_tx_buffer_head] = data;

serial_tx_buffer_head = next_head;

Which to me the OK one generate the right code.
You can go to binary code and modify those two places and see if the problem disappears.
The order is wrong in here. There maybe other places that the compiler generate the wrong code,

@usbcnc
Copy link
Owner

usbcnc commented Jan 17, 2018

In the above code, the order is VERY important, if the order is wrong the result can be unpredictable.

@usbcnc
Copy link
Owner

usbcnc commented Jan 17, 2018

Try change to this and see if the problem is fixed.
__disable_irq();
// Store data and advance head
serial_tx_buffer[serial_tx_buffer_head] = data;

serial_tx_buffer_head = next_head;
__enable_irq();

@lakeroe
Copy link
Author

lakeroe commented Jan 17, 2018

By disabling the IRQs the program works fine now.
I just ran a ~2h program without a problem.
The two lines are still in the wrong order though.

Thanks for your help !

Maybe it's a good idea to update the git sources ...

@usbcnc
Copy link
Owner

usbcnc commented Jan 17, 2018

The original source is good. But the optimization has issue not the code itself.
Disable the IRQ just to prove that optimization does the wrong thing and should not be trusted.

@usbcnc
Copy link
Owner

usbcnc commented Jan 17, 2018

Original GRBL is carefully written and these two lines have a good reason to be in this order. If compiler changes the order, that means it cannot be trusted.
I think there is a way to work around this issue but is it worth the effort?

Original code says "// Store data and advance head"
But the compiler made it "advance the head and store the data". Even though the data is stored in the right place, but it is in the wrong timing.

@radiolee
Copy link

https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads

why not try 2017q4 or other version to check if the problem fixed?

@lakeroe
Copy link
Author

lakeroe commented Jan 18, 2018

I just tried the new gcc (2017q4) and the problem is still the same.
So in your opinion is adding the -fno-schedule-insns2 parameter the best way to solve this issue for newer compilers ?

And one more question just for my understanding:
Even if the compiler changes those two lines (which in my understanding causes the data is stored in the WRONG place) why is it working if IRQs are disabled ?

@radiolee
Copy link

@lakeroe I think disable irq just a debug code, bypass something to test. I really don't know what's different use -fno-schedule-insns2 so can't help.

@usbcnc
Copy link
Owner

usbcnc commented Jan 18, 2018

@usbcnc
Copy link
Owner

usbcnc commented Jan 18, 2018

@lakeroe, I do not know your educational background. If you look at the usb_endp.c which use the flag (in ISR), you should know this order is very important. Disable the IRQ means no interrupt will happen between two lines and the problem will not happen. The problem will happen ONLY if the interrupt is happening between those two lines.

@lakeroe
Copy link
Author

lakeroe commented Jan 18, 2018

I've learnt programming microcontrollers in C 25 years ago and apart from some hobby projects my knowledge is quite basic. I'm currently working as an electronics hardware developer (mainly microcontroller circuits).

Thanks for the explanation, I think I understand the problem now.

Regarding the thread you mentioned above:
This one is from 2009 and the problem still persists in newer gcc versions - sounds very strange to me.

Anyway, my solution is to use your unchanged code and add the -fno-schedule-insns2 flag.

@usbcnc
Copy link
Owner

usbcnc commented Jan 18, 2018

I have read and I do not believe this is a bug for compiler. It just means in this case, this optimization needs to be disabled for those two lines. I think there is a better way to disable this option only for those two lines. -fno-schedule-insns should go with -fno-schedule-insns2. They both do so called "instruction scheduling".

@lakeroe
Copy link
Author

lakeroe commented Jan 18, 2018

By adding "attribute((optimize("-O0")))" it should be possible to completely deactivate optimizing for a specific function, but I think that's not exactly what we want ...

@usbcnc
Copy link
Owner

usbcnc commented Jan 18, 2018

@lakeroe that will do the trick and should be used. It is better than disable the IRQ.

@lakeroe
Copy link
Author

lakeroe commented Jan 20, 2018

Just for your information:
I tried the attribute thingy from above and it also works without problems ...

Do you think that should be added to your source ?

@usbcnc
Copy link
Owner

usbcnc commented Jan 20, 2018

@lakeroe Yes I think so.

@lakeroe
Copy link
Author

lakeroe commented Feb 2, 2018

Just for your information:

__attribute__((optimize("-O1"))) works also fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants