Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][2/n] Core structured logging: add Text formatter and pass log config to worker process #45344

Merged
merged 23 commits into from
Jun 4, 2024

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented May 15, 2024

Why are these changes needed?

  • Add a TextFormatter. It is a formatter that converts LogRecord into a human-readable message string.
  • Add a predefined logging configuration. This allows users to easily configure Ray task and actor loggers to use TextFormatter, and to add arbitrary attributes to log messages.

Example (outdated)

  • Example

    import ray
    import sys
    import logging
    from ray._private.structured_logging.utils import LoggingConfig
    
    ray.init(
        job_config=ray.job_config.JobConfig(py_log_config=LoggingConfig("TEXT"))
    )
    
    def init_logger():
        return logging.getLogger()
    
    logger = logging.getLogger("ray")
    logger.info("Driver process", extra={"username": "johndoe"})
    
    @ray.remote
    def f():
        logger = init_logger()
        logger.info("A Ray task")
    
    @ray.remote
    class actor:
        def __init__(self):
            pass
        def print_message(self):
            logger = init_logger()
            logger.info("A Ray actor")
    
    task_obj_ref = f.remote()
    ray.get(task_obj_ref)
    
    actor_instance = actor.remote()
    ray.get(actor_instance.print_message.remote())
  • Output

    2024-05-21 06:48:06,699 INFO worker.py:1740 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8266
    2024-05-21 06:48:07,333 INFO test_logging.py:14 -- Driver process
    (f pid=3772046) 2024-05-21 06:48:07,349 INFO test_logging.py:19 -- A Ray task job_id=01000000 worker_id=87cac8ad9f50f185f97d0fc36e05616669c3e00a135bc0f443a709ac node_id=683c682fee5d431f3f4cf25fa992df9a7918b1c32e88943e853ad06b task_id=c8ef45ccd0112571ffffffffffffffffffffffff01000000
    (actor pid=3773875) 2024-05-21 06:48:07,870     INFO test_logging.py:27 -- A Ray actor job_id=01000000 worker_id=b362329f5dab902df9e3fc0fe73df6279c661baaa709854e135378aa node_id=683c682fee5d431f3f4cf25fa992df9a7918b1c32e88943e853ad06b actor_id=c4fc669d536be18637fa490201000000 task_id=c2668a65bda616c1c4fc669d536be18637fa490201000000
    

Example

  • Example

    import ray
    import logging
    
    ray.init(
        logging_config = ray.LoggingConfig("TEXT")
    )
    
    def init_logger():
        return logging.getLogger()
    
    logger = logging.getLogger("ray")
    logger.info("Driver process ray logger", extra={"username": "johndoe"})
    
    root_logger = logging.getLogger()
    root_logger.info("Driver process root logger")
    
    @ray.remote
    def f():
        logger = init_logger()
        logger.info("A Ray task")
    
    task_obj_ref = f.remote()
    ray.get(task_obj_ref)
    
    @ray.remote
    class A:
        def f(self):
            logger = init_logger()
            logger.info("A Ray actor task")
    
    actor_instance = A.remote()
    actor_task_obj_ref = actor_instance.f.remote()
    ray.get(actor_task_obj_ref)
  • Output

    2024-06-01 18:10:33,299 INFO worker.py:1756 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
    2024-06-01 18:10:33,878 INFO test_logging.py:12 -- Driver process ray logger
    2024-06-01 18:10:33,878 INFO test_logging.py:15 -- Driver process root logger job_id=01000000 worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff node_id=f7acefc688d7648f921fb39d471bb79850f09b1a1529a551d8cb8c48
    (f pid=1331481) 2024-06-01 18:10:33,894 INFO test_logging.py:20 -- A Ray task job_id=01000000 worker_id=5cc246242da76705d432786c2a114375f5ab082d57c85af5a84338cc node_id=f7acefc688d7648f921fb39d471bb79850f09b1a1529a551d8cb8c48 task_id=c8ef45ccd0112571ffffffffffffffffffffffff01000000
    (A pid=1333316) 2024-06-01 18:10:34,419 INFO test_logging.py:29 -- A Ray actor task job_id=01000000 worker_id=c99d80fd988eacdbc52e0768a2ed511690889b13041254a7d3f5c7da node_id=f7acefc688d7648f921fb39d471bb79850f09b1a1529a551d8cb8c48 actor_id=af0d2089deb37b15334cad9401000000 task_id=c2668a65bda616c1af0d2089deb37b15334cad9401000000
    

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@kevin85421 kevin85421 requested review from ericl, pcmoritz, raulchen and a team as code owners May 15, 2024 01:49
@kevin85421 kevin85421 changed the title [Core][2/n] Core structured logging: add logfmt formatter and passing log config to worker process [Core][2/n] Core structured logging: add logfmt formatter and pass log config to worker process May 15, 2024
@kevin85421 kevin85421 marked this pull request as draft May 15, 2024 01:50
@kevin85421 kevin85421 changed the title [Core][2/n] Core structured logging: add logfmt formatter and pass log config to worker process [WIP][Core][2/n] Core structured logging: add logfmt formatter and pass log config to worker process May 15, 2024
python/ray/tests/test_structured_logging.py Show resolved Hide resolved
},
"handlers": {
"console": {
"level": "INFO",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
"filters": {
"core_context": {
"()": CoreContextFilter,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The key '()' has been used as the special key because it is not a valid keyword parameter name, and so will not clash with the names of the keyword arguments used in the call. The '()' also serves as a mnemonic that the corresponding value is a callable."

Ref: https://docs.python.org/3/library/logging.config.html#logging-config-dictschema

python/ray/_private/structured_logging/utils.py Outdated Show resolved Hide resolved
python/ray/tests/test_structured_logging.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/formatters.py Outdated Show resolved Hide resolved
@@ -50,6 +51,7 @@ def __init__(
ray_namespace: Optional[str] = None,
default_actor_lifetime: str = "non_detached",
_py_driver_sys_path: Optional[List[str]] = None,
py_log_config: Optional[Union[dict, str]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a LoggingConfig class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class LoggingConfig:
    dict_config: Union[dict, str] = "TEXT"
    log_level: int = INFO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to create a LoggingConfig class? With LoggingConfig, user's code will look like:

ray.init(job_config=ray.job_config.JobConfig(py_log_config=ray.xxx.yyy.zzz.LoggingConfig("TEXT")))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to create a LoggingConfig class?

Maybe this is because we might add new fields to the LoggingConfig in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated bba3b09

# A dictionary to map encoding types to their corresponding logging configurations.
LOG_MODE_DICT = {
"TEXT": {
"version": 1,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"version - to be set to an integer value representing the schema version. The only valid value at present is 1, but having this key allows the schema to evolve while still preserving backwards compatibility."

https://docs.python.org/3/library/logging.config.html

def generate_record_format_attrs(
formatter: logging.Formatter,
record: logging.LogRecord,
exclude_standard_attrs=False,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have decided to use exclude_standard_attrs instead of include_standard_attrs or include_extra_attrs because I think exclude_standard_attrs is more accurate.

  • include_standard_attrs: This function doesn't include all standard LogRecrod attributes.
  • include_extra_attrs: This function not only include attributes specified by logger.info(..., extra={...}) but also Ray context.

LOG_MODE_DICT = {
"TEXT": {
"version": 1,
"disable_existing_loggers": False,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disable_existing_loggers – If specified as False, loggers which exist when this call is made are left enabled. The default is True because this enables old behaviour in a backward-compatible way. This behaviour is to disable any existing non-root loggers unless they or their ancestors are explicitly named in the logging configuration.

https://docs.python.org/3/library/logging.config.html

@kevin85421 kevin85421 changed the title [WIP][Core][2/n] Core structured logging: add logfmt formatter and pass log config to worker process [Core][2/n] Core structured logging: add logfmt formatter and pass log config to worker process May 21, 2024
@kevin85421 kevin85421 changed the title [Core][2/n] Core structured logging: add logfmt formatter and pass log config to worker process [Core][2/n] Core structured logging: add Text formatter and pass log config to worker process May 21, 2024
@kevin85421 kevin85421 marked this pull request as ready for review May 21, 2024 06:50
@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label May 21, 2024
python/ray/_private/structured_logging/formatters.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/formatters.py Outdated Show resolved Hide resolved
python/ray/job_config.py Outdated Show resolved Hide resolved
@@ -226,4 +233,5 @@ def from_json(cls, job_config_json):
ray_namespace=job_config_json.get("ray_namespace", None),
_client_job=job_config_json.get("client_job", False),
_py_driver_sys_path=job_config_json.get("py_driver_sys_path", None),
py_log_config=job_config_json.get("py_log_config", None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we get from job_config_json won't have type LoggingConfig. I think we don't need to support json for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

class LoggingConfig:
def __init__(self, log_config: Union[dict, str]):
if isinstance(log_config, str):
assert log_config in LOG_MODE_DICT, (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should raise ValueError to the user instead of check fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated a38ec10



class LoggingConfig:
def __init__(self, log_config: Union[dict, str]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, let's do

class LoggingConfig:
    dict_config: Union[dict, str] = "TEXT"
    # Only effective if dict_config is not a dict
    log_level: int = INFO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated a38ec10

python/ray/_private/structured_logging/utils.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/utils.py Outdated Show resolved Hide resolved
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

python/ray/_private/structured_logging/utils.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/utils.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/utils.py Outdated Show resolved Hide resolved
@jjyao
Copy link
Collaborator

jjyao commented May 24, 2024

@rkooo567 can you also check the API to see if it makes sense?

Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
from ray._private.structured_logging.formatters import TextFormatter
from ray._private.structured_logging.filters import CoreContextFilter

LOG_MODE_DICT = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can just define this in the constants.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to constants.py will cause a circular import.

  • LOG_MODE_DICT requires TextFormatter and CoreContextFilter from formatters.py and filters.py.
  • Both filters.py and formatters.py also import constants.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 2ec1b4f

from ray.util.annotations import PublicAPI

if TYPE_CHECKING:
from ray.runtime_env import RuntimeEnv


@PublicAPI(stability="alpha")
class LoggingConfig:
def __init__(self, log_config: Union[dict, str] = "TEXT", log_level: str = "INFO"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_config -> dict_config? to make it clear that we use python dictConfig to configure the logging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated d9e63aa

py_logging_config = pickle.loads(serialized_py_logging_config)
log_config_dict = py_logging_config.get_dict_config()
if log_config_dict:
logging.config.dictConfig(log_config_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to try catch. See

if worker_process_setup_hook_key:
        error = load_and_execute_setup_hook(worker_process_setup_hook_key)
        if error is not None:
            worker.core_worker.drain_and_exit_worker("system", error)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated d9e63aa

Signed-off-by: kaihsun <[email protected]>
except Exception as e:
backtrace = \
"".join(traceback.format_exception(type(e), e, e.__traceback__))
print(backtrace)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't add print here, users can't see any error message from the driver process. The following screenshot is an example of backtrace.

 import ray
 import logging
 from ray.job_config import LoggingConfig

 ray.init(
     job_config=ray.job_config.JobConfig(py_logging_config=LoggingConfig({"abc": "123"}))
 )

 def init_logger():
     return logging.getLogger()

 logger = logging.getLogger("ray")
 logger.info("Driver process", extra={"username": "johndoe"})

 @ray.remote
 def f():
     logger = init_logger()
     logger.info("A Ray task")

 task_obj_ref = f.remote()
 ray.get(task_obj_ref)
Screenshot 2024-05-27 at 11 58 05 PM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this print() for now to be consistent with the error handling of worker_process_startup_hook. Users are able to know the worker crash reason from ray list workers and ActorDiedException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this print() for now to be consistent with the error handling of worker_process_startup_hook. Users are able to know the worker crash reason from ray list workers and ActorDiedException

I don't think any user can troubleshoot this kind of issue on their own before we configure the root logger of the driver process. It's OK to remove print here because:

  • We currently only support encoding and verify it during its initialization process. Hence, it is impossible to go into this except statement.
  • We have currently decided to configure the root logger of the driver process, so the error message will also be printed in the driver process, even though we don't add print(backtrace).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 36dd127

@kevin85421
Copy link
Member Author

  • Case 1: pb.serialized_py_logging_config is 428 bytes.

    log_dict_config = {
        "version": 1,
        "disable_existing_loggers": False,
        "formatters": {
            "text": {
                "()": "ray._private.structured_logging.formatters.TextFormatter",
            },
        },
        "filters": {
            "core_context": {
                "()": "ray._private.structured_logging.filters.CoreContextFilter",
            },
        },
        "handlers": {
            "console": {
                "level": "INFO",
                "class": "logging.StreamHandler",
                "formatter": "text",
                "filters": ["core_context"],
            },
        },
        "root": {
            "level": "INFO",
            "handlers": ["console"],
        },
    }
    ray.init(
         job_config=ray.job_config.JobConfig(py_logging_config=LoggingConfig(dict_config))
    )
  • Case 2: pb.serialized_py_logging_config is 95 bytes.

    ray.init(
         job_config=ray.job_config.JobConfig(py_logging_config=LoggingConfig("TEXT"))
    )

Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

python/ray/_private/structured_logging/logging_config.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/logging_config.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/logging_config.py Outdated Show resolved Hide resolved
python/ray/_private/structured_logging/logging_config.py Outdated Show resolved Hide resolved
python/ray/_private/worker.py Outdated Show resolved Hide resolved
python/ray/_private/worker.py Outdated Show resolved Hide resolved
python/ray/job_config.py Outdated Show resolved Hide resolved
python/ray/job_config.py Outdated Show resolved Hide resolved
src/ray/protobuf/common.proto Outdated Show resolved Hide resolved
Signed-off-by: kaihsun <[email protected]>
@kevin85421 kevin85421 removed the go add ONLY when ready to merge, run all tests label Jun 3, 2024
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
python/ray/_private/structured_logging/logging_config.py Outdated Show resolved Hide resolved
except Exception as e:
backtrace = \
"".join(traceback.format_exception(type(e), e, e.__traceback__))
print(backtrace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this print() for now to be consistent with the error handling of worker_process_startup_hook. Users are able to know the worker crash reason from ray list workers and ActorDiedException

python/ray/tests/test_structured_logging.py Outdated Show resolved Hide resolved
python/ray/tests/test_structured_logging.py Outdated Show resolved Hide resolved
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Jun 3, 2024
Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>


@PublicAPI(stability="alpha")
@dataclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kw_only=True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kw_only is added in Python 3.10. It may be incompatible with some Ray images.

Screenshot 2024-06-03 at 4 27 31 PM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@@ -1332,6 +1335,8 @@ def init(
timestamp, filename, line number, and message. See the source file
ray_constants.py for details. Ignored unless "configure_logging"
is true.
logging_config: [Experimental] Logging configuration will be applied to both
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:class:`~ray._private...`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 83cf819. I don't build the doc locally. I will wait for the CI to verify whether :class:~LoggingConfig`` works or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:class:xxx is not used to add hyperlinks. It is just used to display 'xxx' as "inline code" (i.e. xxx). For example,

  • worker.py
    `ray_waitables` is a list of :class:`~ObjectRef` and :class:`~ObjectRefGenerator`.
    
  • After render
    Screenshot 2024-06-03 at 8 50 50 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed logging_config: Optional[LoggingConfig] = None to logging_config: Optional["LoggingConfig"] = None. This change generates a hyperlink after rendering. However, the link points to Ray Serve's LoggingConfig instead of the LoggingConfig for structured logging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used

logging_config: Optional["ray._private.structured_logging.logging_config.LoggingConfig"] = None

no hyperlink is generated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible solution that I can come up with is to use a different class name. Honestly, I don’t have any clue what happens under the hood.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll fix it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution: #45736 (comment)

python/ray/_private/worker.py Outdated Show resolved Hide resolved
Signed-off-by: kaihsun <[email protected]>
@kevin85421 kevin85421 removed the go add ONLY when ready to merge, run all tests label Jun 3, 2024
@jjyao jjyao merged commit 32cddae into ray-project:master Jun 4, 2024
5 checks passed
@kevin85421 kevin85421 assigned kevin85421 and unassigned rkooo567 Jun 4, 2024
richardsliu pushed a commit to richardsliu/ray that referenced this pull request Jun 12, 2024
…config to worker process (ray-project#45344)

Signed-off-by: kaihsun <[email protected]>
Signed-off-by: Richard Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants