Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Modify the low frequency log level of some critical path from DEBUG to INFO or WARNING #14205

Closed
wants to merge 2 commits into from

Conversation

wumuzi520
Copy link
Contributor

Why are these changes needed?

  • Problem

    • In the production environment, the log level of the engine is usually set to INFO, but some low-frequency key logs are currently DEBUG, which makes it extremely difficult to debug problems.
    • In the production environment, it is occasionally found that some subscription messages are not received by Rayet, which is suspected to be caused by subscription failure. However, it is difficult to confirm because the log of subscription success is not added.
  • Solution

    • Modify the low frequency log level of some critical path from DEBUG to INFO or WARNING.
    • Add INFO log when succeed in subscribing some topics.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@wumuzi520 wumuzi520 changed the title Modify the low frequency log level of some critical path from DEBUG to INFO or WARNING [Core] Modify the low frequency log level of some critical path from DEBUG to INFO or WARNING Feb 19, 2021
@rkooo567 rkooo567 self-assigned this Feb 19, 2021
@rkooo567
Copy link
Contributor

rkooo567 commented Feb 19, 2021

Can we avoid using WARNING? We shouldn't use it unless the message will clearly indicate the symptoms of the warning and action items because they are streamed to user drivers. It is desirable not to print impl details. But anyway I will review the PR soon.

@@ -313,8 +313,8 @@ Status GcsActorManager::CreateActor(const ray::rpc::CreateActorRequest &request,

auto iter = registered_actors_.find(actor_id);
if (iter == registered_actors_.end()) {
RAY_LOG(DEBUG) << "Actor " << actor_id
<< " may be already destroyed, job id = " << actor_id.JobId();
RAY_LOG(WARNING) << "Actor " << actor_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change it to INFO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing users can do when they look at this log.

@@ -834,8 +834,8 @@ void GcsActorManager::Initialize(const GcsInitData &gcs_init_data) {
// Notify raylets to release unused workers.
gcs_actor_scheduler_->ReleaseUnusedWorkers(node_to_workers);

RAY_LOG(DEBUG) << "The number of registered actors is " << registered_actors_.size()
<< ", and the number of created actors is " << created_actors_.size();
RAY_LOG(INFO) << "The number of registered actors is " << registered_actors_.size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing this to INFO, can we have more generalized log messages in the initialize step? Like we are doing it for DebugString

@@ -49,7 +49,7 @@ void GcsResourceManager::HandleUpdateResources(
const rpc::UpdateResourcesRequest &request, rpc::UpdateResourcesReply *reply,
rpc::SendReplyCallback send_reply_callback) {
NodeID node_id = NodeID::FromBinary(request.node_id());
RAY_LOG(DEBUG) << "Updating resources, node id = " << node_id;
RAY_LOG(INFO) << "Updating resources, node id = " << node_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is high frequency isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be updated every 100ms

@@ -84,7 +84,7 @@ void GcsResourceManager::HandleUpdateResources(
nullptr));

GCS_RPC_SEND_REPLY(send_reply_callback, reply, status);
RAY_LOG(DEBUG) << "Finished updating resources, node id = " << node_id;
RAY_LOG(INFO) << "Finished updating resources, node id = " << node_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -101,7 +101,7 @@ void GcsResourceManager::HandleDeleteResources(
const rpc::DeleteResourcesRequest &request, rpc::DeleteResourcesReply *reply,
rpc::SendReplyCallback send_reply_callback) {
NodeID node_id = NodeID::FromBinary(request.node_id());
RAY_LOG(DEBUG) << "Deleting node resources, node id = " << node_id;
RAY_LOG(INFO) << "Deleting node resources, node id = " << node_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use this endpoint anymore. Isn't it for dymanic resources? It is deprecated.

@@ -135,7 +135,7 @@ Process WorkerPool::StartWorkerProcess(
RAY_CHECK(!job_id.IsNil());
auto it = all_jobs_.find(job_id);
if (it == all_jobs_.end()) {
RAY_LOG(DEBUG) << "Job config of job " << job_id << " are not local yet.";
RAY_LOG(WARNING) << "Job config of job " << job_id << " are not local yet.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also shouldn't be warning.

RAY_LOG(DEBUG) << "Worker not started, " << starting_workers
<< " workers of language type " << static_cast<int>(language)
<< " pending registration";
RAY_LOG(WARNING) << "Worker not started, " << starting_workers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also shouldn't be warning. There's no way any user will know what to do after looking at this from their driver.

@@ -283,8 +283,8 @@ Process WorkerPool::StartWorkerProcess(
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
stats::ProcessStartupTimeMs.Record(duration.count());

RAY_LOG(DEBUG) << "Started worker process of " << workers_to_start
<< " worker(s) with pid " << proc.GetId();
RAY_LOG(INFO) << "Started worker process of " << workers_to_start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STarting -> Started -> Registered is duplicated log. Can you just have one log message?

@@ -736,8 +747,8 @@ void NodeManager::NodeRemoved(const NodeID &node_id) {

// Remove the node from the resource map.
if (!cluster_resource_scheduler_->RemoveNode(node_id.Binary())) {
RAY_LOG(DEBUG) << "Received NodeRemoved callback for an unknown node: " << node_id
<< ".";
RAY_LOG(WARNING) << "Received NodeRemoved callback for an unknown node: " << node_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indicate what's the side effect of this warning message.

RAY_LOG(DEBUG) << "[ResourceCreateUpdated] received callback from node id " << node_id
<< " with created or updated resources: "
<< createUpdatedResources.ToString() << ". Updating resource map.";
RAY_LOG(INFO) << "[ResourceCreateUpdated] received callback from node id " << node_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the log message is too low level. Same as below.

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 24, 2021
@rkooo567 rkooo567 removed their assignment Nov 23, 2021
@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@kfstorm
Copy link
Member

kfstorm commented Mar 13, 2022

‼️ ACTION REQUIRED ‼️

We've updated our formatting configuration for C++ code. (see #22725)

This PR includes C++ code change. To prevent issues with merging your code, here's what you'll need to do:

  1. Merge the latest changes from upstream/master branch into your branch.
git pull upstream master
git merge upstream/master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated C++ formatting configuration.

  1. Format changed files.
scripts/format.sh
  1. Commit your changes.
git add --all
git commit -m "Format C++ code"

@stale
Copy link

stale bot commented Apr 16, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 16, 2022
@stale
Copy link

stale bot commented Jul 30, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants