Skip to content
This repository has been archived by the owner on Jul 21, 2024. It is now read-only.

2024 Beta RIO 1 Out-Of-Memory's after some deploys #39

Open
CoryNessCTR opened this issue Oct 19, 2023 · 14 comments
Open

2024 Beta RIO 1 Out-Of-Memory's after some deploys #39

CoryNessCTR opened this issue Oct 19, 2023 · 14 comments

Comments

@CoryNessCTR
Copy link
Contributor

Describe the bug
After a couple java project deploys, on a roboRIO 1, the DS will report an out of memory exception

To Reproduce
Steps to reproduce the behavior:

  1. Format/power cycle roboRIO 1
  2. Create a new Timed Robot Skeleton Java Project
  3. Construct a Talon object with PWM channel 0
  4. Deploy project to roboRIO 1
  5. Increment channel
  6. Repeat steps 4-5
  7. Eventually (in less then 10 repeats), get the following error:
OpenJDK Client VM warning: INFO: os::commit_memory(0xb0000000, 4194304, 0) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 4194304 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid7540.log

Expected behavior
Out of memory exception does not occur.

Desktop (please complete the following information):

  • OS: Windows 10
  • Project Information:
WPILib Information:
Project Version: 2024.1.1-beta-1
VS Code Version: 1.83.1
WPILib Extension Version: 2024.1.1-beta-1
C++ Extension Version: 1.17.5
Java Extension Version: 1.23.0
Java Debug Extension Version: 0.52.0
Java Dependencies Extension Version 0.23.0
Java Version: 17
Java Location: C:\Users\Public\wpilib\2024\jdk
Vendor Libraries:
   WPILib-New-Commands (1.0.0)

Additional context
I collected memory information before and after each deploy, available as a zip below:
Deploy 0 is collected immediately after power cycling the roboRIO, Deploy 5 is after the Out of Memory error occurred.
MemoryIssues.zip

I've also attached the log file of the out of memory error:
hs_err_pid7540.log

I've also repeated this experiment on the 2023_v3.2 image for a comparison, and stopped my testing after 30 consecutive deploys without issue. This appears to be a new or worsened issue for the 2024 libraries.

@rzblue rzblue transferred this issue from wpilibsuite/allwpilib Dec 4, 2023
@EyalKeysar
Copy link

EyalKeysar commented Dec 7, 2023

We have also encountered this issue, and we found a way to solve it temporarily until NI releases an update.
I want to clarify that this solution is not official.

From what we understand the issue is caused because of multiple processes that take a lot of memory.
To see what processes are currently running in the roboRIO you first need to connect with SSH to the roboRIO (https://docs.wpilib.org/en/stable/docs/software/roborio-info/roborio-ssh.html).
Once you are connected to the roboRIO with SSH you can view the currently running processes using the "top" command (https://man7.org/linux/man-pages/man1/top.1.html).
Now you can see that few processes take more memory than others, these processes are the processes that run when you deploy, ideally, they should not run after you deploy again but they do, and because of this after a few deploys you get this error.
To solve this we are killing these processes when we get this error and it solves the problem.
To find the specific process we want to kill we use the "grep" command like this:
top | grep "JRE"
the output of this command is every process that has "JRE" in its "top" attributes. Now remember the PIDs (Process ID) of the output processes.
So now to kill the processes we need to use the "kill" command (https://man7.org/linux/man-pages/man1/kill.1.html), So if the PID is 2230 we will use it like this:
kill -9 2230
Run this command for every PID that you got from the filtered top (top | grep "JRE") command.
This should solve the problem.
In this example the PID is 4962:
image

@calcmogul
Copy link
Member

calcmogul commented Dec 7, 2023

You could use this instead to force-kill all processes with JRE in their name:

pgrep JRE | xargs kill -9

The following may work for remote kill, but I haven't tested whether ssh allows embedding pipes like that.

ssh [email protected] 'pgrep JRE | xargs kill -9'

@EyalKeysar
Copy link

When connected to roboRIO with USB the IP you want to SSH to is 172.22.11.2.
(https://docs.wpilib.org/he/stable/docs/software/roborio-info/roborio-ssh.html)

@aaronleetw
Copy link

We are also getting this issue. I'll test the remote kill ssh command.

@Crossle86
Copy link

The problem in #40 is likely related but the symptoms are not quite the same... Some observations:
Killing the JRE process just causes another to be started in its place. From what I can see, that restart sometimes helps sometimes not.

Download of code seems to always be successful, the problem appears to be in the startup of the code. Sometimes starts ok, most other times starts with trash in the riolog or an incomplete riolog or an apparently good startup but starts logging lots of errors from CAN devices. Power off then on works.

@aaronleetw
Copy link

I'm not sure if it is related, but the "Restart Robot Code" option also does not work regardless of its state.

@sciencewhiz
Copy link
Contributor

Does this still occur with the WPILib beta 4?

@Crossle86
Copy link

Have not had a chance to test B4 yet. Not sure when I can do it now that xmas is here. Will try sometime next week.

@stephenjust
Copy link

I'm still reproducing this on the Kickoff release

@aaronleetw
Copy link

I am still reproducing this issue, albeit much less, in the kickoff release. After three days of testing, it failed one time.

@Crossle86
Copy link

Our team has not seen any problems with deployment since kickoff release.

@JaiCode08
Copy link

JaiCode08 commented Feb 22, 2024

Hello. This issue for me is still occurring. I'm not doing any heavy logging or heavy computing on the roboRIO. The memory leaks occasionally happened in WPILib 2024.2.1 but has gotten worse with 2024.3.1. The roboRIO is on the latest firmware.

@nkalupahana
Copy link

We're also having this issue whenever we add any sort of logging to our code: https://github.com/FRC-7525/2024-Robot

@Crossle86
Copy link

An update on this for our team. We stopped having the fail to deploy issue and things seemed normal until we started loading Autos created with PathPlanner. With only a couple Autos we started having out of memory errors to the point we bit the bullet and took a RIO2 out of last year's robot and that solved the out of memory issue. I was being cheap trying to use a RIO1 for this years robot.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants