Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Basic structured logging #47210

Merged

Conversation

omatthew98
Copy link
Contributor

@omatthew98 omatthew98 commented Aug 19, 2024

Why are these changes needed?

Adds structured logging to Ray Data. This will allow users to configure logging to use any of the following:

  • A user's custom logging file (existing functionality)
  • A default TEXT logger (existing functionality)
  • A default JSON logger (new functionality)

Examples:

Code snippet:

import ray
import time

def f(x):
    time.sleep(0.1)
    return x

def g(x):
    time.sleep(1)
    return x

ray.data.range(100).map(f).map(g, num_cpus=0.1).materialize()

JSON logging (new)

Console output:

❯ RAY_DATA_LOG_ENCODING="JSON" python log.py
2024-10-09 11:15:55,473	INFO worker.py:1777 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
2024-10-09 11:15:56,490	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-10-09_11-15-54_688421_42223/logs/ray-data
2024-10-09 11:15:56,490	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]
✔️  Dataset execution finished in 10.01 seconds: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:10<00:00, 10.0 row/s]
- ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B]: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.0 row/s]
- Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 32.0B]: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.0 row/s]

ray-data.log:

❯ cat /tmp/ray/session_2024-10-09_11-15-54_688421_42223/logs/ray-data/ray-data.log
{"asctime": "2024-10-09 11:15:55,963", "levelname": "DEBUG", "message": "Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.", "filename": "util.py", "lineno": 203, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.", "filename": "util.py", "lineno": 203, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Expected in-memory size 800, block size 32.0", "filename": "set_read_parallelism.py", "lineno": 34, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Size based split factor 1", "filename": "set_read_parallelism.py", "lineno": 42, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Blocks after size splits 25", "filename": "set_read_parallelism.py", "lineno": 44, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,489", "levelname": "DEBUG", "message": "Using autodetected parallelism=24 for operator ReadRange to satisfy parallelism at least twice the available number of CPUs (12).", "filename": "set_read_parallelism.py", "lineno": 117, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "DEBUG", "message": "Estimated num output blocks 25", "filename": "set_read_parallelism.py", "lineno": 132, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "INFO", "message": "Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-10-09_11-15-54_688421_42223/logs/ray-data", "filename": "streaming_executor.py", "lineno": 108, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "INFO", "message": "Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]", "filename": "streaming_executor.py", "lineno": 109, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,490", "levelname": "DEBUG", "message": "Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=0.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=False, preserve_order=False, actor_locality_enabled=False, verbose_progress=True)", "filename": "streaming_executor.py", "lineno": 111, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,511", "levelname": "DEBUG", "message": "ConcurrencyCapBackpressurePolicy initialized with: {InputDataBuffer[Input]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]: inf}", "filename": "concurrency_cap_backpressure_policy.py", "lineno": 37, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "Operator Metrics:\nInput: {'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 67350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}\nReadRange->Map(f): {'num_inputs_received': 1, 'bytes_inputs_received': 2694, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 2694, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 1, 'num_tasks_running': 1, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 2694, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 268435456, 'cpu_usage': 1, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}\nMap(g): {'num_inputs_received': 0, 'bytes_inputs_received': 0, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 0, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 0, 'num_tasks_running': 0, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1}}\n", "filename": "streaming_executor.py", "lineno": 483, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "Execution Progress:", "filename": "streaming_executor.py", "lineno": 466, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "1: - ReadRange->Map(f): 1 active, 24 queued \ud83d\udea7, [cpu: 1.0, objects: 256.0MB], Blocks Outputted: 0/None", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "2: - Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 0/None", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:56,521", "levelname": "DEBUG", "message": "Operator InputDataBuffer[Input] completed. Operator Metrics:\n{'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 67350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}", "filename": "streaming_executor.py", "lineno": 338, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:15:58,879", "levelname": "DEBUG", "message": "Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] completed. Operator Metrics:\n{'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 67350, 'bytes_inputs_of_submitted_tasks': 67350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.345995793, 'task_submission_backpressure_time': 2.3538285830000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 67350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 800, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}", "filename": "streaming_executor.py", "lineno": 338, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,533", "levelname": "DEBUG", "message": "Operator Metrics:\nInput: {'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 67350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}\nReadRange->Map(f): {'num_inputs_received': 25, 'bytes_inputs_received': 67350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 67350, 'bytes_inputs_of_submitted_tasks': 67350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.345995793, 'task_submission_backpressure_time': 2.3538285830000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 67350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}\nMap(g): {'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 1, 'bytes_task_inputs_processed': 32, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 1, 'bytes_task_outputs_generated': 32, 'rows_task_outputs_generated': 4, 'num_outputs_taken': 1, 'bytes_outputs_taken': 32, 'num_outputs_of_finished_tasks': 1, 'bytes_outputs_of_finished_tasks': 32, 'num_tasks_submitted': 25, 'num_tasks_running': 24, 'num_tasks_have_outputs': 1, 'num_tasks_finished': 1, 'num_tasks_failed': 0, 'block_generation_time': 4.0186684999999995, 'task_submission_backpressure_time': 3.3788410420000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 768, 'obj_store_mem_freed': 32, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768.0, 'cpu_usage': 2.4000000000000004, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}\n", "filename": "streaming_executor.py", "lineno": 483, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "Execution Progress:", "filename": "streaming_executor.py", "lineno": 466, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "1: - ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 768.0B], Blocks Outputted: 25/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:01,534", "levelname": "DEBUG", "message": "2: - Map(g): 24 active, 0 queued, [cpu: 2.4, objects: 768.0B], Blocks Outputted: 1/25", "filename": "streaming_executor.py", "lineno": 468, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:06,493", "levelname": "DEBUG", "message": "Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)] completed. Operator Metrics:\n{'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 800, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 100.44397699799998, 'task_submission_backpressure_time': 3.3788410420000003, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 800, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 32, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}", "filename": "streaming_executor.py", "lineno": 338, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}
{"asctime": "2024-10-09 11:16:06,498", "levelname": "DEBUG", "message": "Shutting down <StreamingExecutor(StreamingExecutor-26a7573d42af437b94e5fac95c711658, stopped daemon 13186969600)>.", "filename": "streaming_executor.py", "lineno": 182, "job_id": "01000000", "worker_id": "01000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "61a34d5f67987e4e6bcb41ea37b914d90d83a6aa3057a25e731d4840"}

TEXT logging (unchanged)

Console output:

❯ RAY_DATA_LOG_ENCODING="TEXT" python log.py
2024-09-10 14:02:09,407	INFO worker.py:1777 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
2024-09-10 14:02:10,353	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-09-10_14-02-08_613994_98769/logs/ray-data
2024-09-10 14:02:10,353	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]
✔️  Dataset execution finished in 9.86 seconds: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.1 row/s]
- ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B]: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.2 row/s]
- Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 96.0B]: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.2 row/s]

ray-data.log:

2024-09-10 14:02:09,843	DEBUG util.py:203 -- Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.
2024-09-10 14:02:10,353	DEBUG util.py:203 -- Autodetected parallelism=24 based on estimated_available_cpus=12 and estimated_data_size=800.
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:34 -- Expected in-memory size 800, block size 32.0
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:42 -- Size based split factor 1
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:44 -- Blocks after size splits 25
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:117 -- Using autodetected parallelism=24 for operator ReadRange to satisfy parallelism at least twice the available number of CPUs (12).
2024-09-10 14:02:10,353	DEBUG set_read_parallelism.py:132 -- Estimated num output blocks 25
2024-09-10 14:02:10,353	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-09-10_14-02-08_613994_98769/logs/ray-data
2024-09-10 14:02:10,353	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]
2024-09-10 14:02:10,353	DEBUG streaming_executor.py:111 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf), exclude_resources=ExecutionResources(cpu=0.0, gpu=0.0, object_store_memory=0.0B), locality_with_output=False, preserve_order=False, actor_locality_enabled=False, verbose_progress=True)
2024-09-10 14:02:10,383	DEBUG concurrency_cap_backpressure_policy.py:37 -- ConcurrencyCapBackpressurePolicy initialized with: {InputDataBuffer[Input]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)]: inf, InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)]: inf}
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:478 -- Operator Metrics:
Input: {'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 66350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}
ReadRange->Map(f): {'num_inputs_received': 1, 'bytes_inputs_received': 2654, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 2654, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 1, 'num_tasks_running': 1, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 2654, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 268435456, 'cpu_usage': 1, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}
Map(g): {'num_inputs_received': 0, 'bytes_inputs_received': 0, 'num_task_inputs_processed': 0, 'bytes_task_inputs_processed': 0, 'bytes_inputs_of_submitted_tasks': 0, 'num_task_outputs_generated': 0, 'bytes_task_outputs_generated': 0, 'rows_task_outputs_generated': 0, 'num_outputs_taken': 0, 'bytes_outputs_taken': 0, 'num_outputs_of_finished_tasks': 0, 'bytes_outputs_of_finished_tasks': 0, 'num_tasks_submitted': 0, 'num_tasks_running': 0, 'num_tasks_have_outputs': 0, 'num_tasks_finished': 0, 'num_tasks_failed': 0, 'block_generation_time': 0, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 0, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1}}

2024-09-10 14:02:10,393	DEBUG streaming_executor.py:461 -- Execution Progress:
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:463 -- 0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:463 -- 1: - ReadRange->Map(f): 1 active, 24 queued 🚧, [cpu: 1.0, objects: 256.0MB], Blocks Outputted: 0/None
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:463 -- 2: - Map(g): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 0/None
2024-09-10 14:02:10,393	DEBUG streaming_executor.py:338 -- Operator InputDataBuffer[Input] completed. Operator Metrics:
{'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 66350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}
2024-09-10 14:02:13,165	DEBUG streaming_executor.py:338 -- Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] completed. Operator Metrics:
{'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 66350, 'bytes_inputs_of_submitted_tasks': 66350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.35352417, 'task_submission_backpressure_time': 2.240482626999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 66350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 800, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:478 -- Operator Metrics:
Input: {'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_outputs_taken': 25, 'bytes_outputs_taken': 66350, 'task_submission_backpressure_time': 0, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_used': 0, 'cpu_usage': 0, 'gpu_usage': 0}
ReadRange->Map(f): {'num_inputs_received': 25, 'bytes_inputs_received': 66350, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 66350, 'bytes_inputs_of_submitted_tasks': 66350, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 10.35352417, 'task_submission_backpressure_time': 2.240482626999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 66350, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 1, 'scheduling_strategy': 'SPREAD'}}
Map(g): {'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 1, 'bytes_task_inputs_processed': 32, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 1, 'bytes_task_outputs_generated': 32, 'rows_task_outputs_generated': 4, 'num_outputs_taken': 1, 'bytes_outputs_taken': 32, 'num_outputs_of_finished_tasks': 1, 'bytes_outputs_of_finished_tasks': 32, 'num_tasks_submitted': 25, 'num_tasks_running': 24, 'num_tasks_have_outputs': 1, 'num_tasks_finished': 1, 'num_tasks_failed': 0, 'block_generation_time': 4.0134345, 'task_submission_backpressure_time': 3.350887582999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 768, 'obj_store_mem_freed': 32, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 768.0, 'cpu_usage': 2.4000000000000004, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}

2024-09-10 14:02:15,483	DEBUG streaming_executor.py:461 -- Execution Progress:
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:463 -- 0: - Input: 0 active, 0 queued, [cpu: 0.0, objects: 0.0B], Blocks Outputted: 25/25
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:463 -- 1: - ReadRange->Map(f): 0 active, 0 queued, [cpu: 0.0, objects: 768.0B], Blocks Outputted: 25/25
2024-09-10 14:02:15,483	DEBUG streaming_executor.py:463 -- 2: - Map(g): 24 active, 0 queued, [cpu: 2.4, objects: 768.0B], Blocks Outputted: 1/25
2024-09-10 14:02:20,214	DEBUG streaming_executor.py:338 -- Operator InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(f)] -> TaskPoolMapOperator[Map(g)] completed. Operator Metrics:
{'num_inputs_received': 25, 'bytes_inputs_received': 800, 'num_task_inputs_processed': 25, 'bytes_task_inputs_processed': 800, 'bytes_inputs_of_submitted_tasks': 800, 'num_task_outputs_generated': 25, 'bytes_task_outputs_generated': 800, 'rows_task_outputs_generated': 100, 'num_outputs_taken': 25, 'bytes_outputs_taken': 800, 'num_outputs_of_finished_tasks': 25, 'bytes_outputs_of_finished_tasks': 800, 'num_tasks_submitted': 25, 'num_tasks_running': 0, 'num_tasks_have_outputs': 25, 'num_tasks_finished': 25, 'num_tasks_failed': 0, 'block_generation_time': 100.446355003, 'task_submission_backpressure_time': 3.350887582999999, 'obj_store_mem_internal_inqueue_blocks': 0, 'obj_store_mem_internal_inqueue': 0, 'obj_store_mem_internal_outqueue_blocks': 0, 'obj_store_mem_internal_outqueue': 0, 'obj_store_mem_pending_task_inputs': 0, 'obj_store_mem_freed': 800, 'obj_store_mem_spilled': 0, 'obj_store_mem_used': 96, 'cpu_usage': 0, 'gpu_usage': 0, 'ray_remote_args': {'num_cpus': 0.1, 'scheduling_strategy': 'SPREAD'}}
2024-09-10 14:02:20,217	DEBUG streaming_executor.py:182 -- Shutting down <StreamingExecutor(StreamingExecutor-9091586451d54141b2ef511d6f302bde, stopped daemon 13086404608)>.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

python/ray/data/_internal/logging.py Outdated Show resolved Hide resolved
python/ray/data/_internal/logging.py Outdated Show resolved Hide resolved
python/ray/data/_internal/logging.py Outdated Show resolved Hide resolved
python/ray/data/_internal/logging_json.yaml Outdated Show resolved Hide resolved
python/ray/data/_internal/logging_json.yaml Outdated Show resolved Hide resolved
@omatthew98 omatthew98 force-pushed the mowen/ray-data-structured-logging branch from 18a921c to 7c6313d Compare September 9, 2024 22:40
@@ -0,0 +1,34 @@
version: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this yaml file format? Is this somethign specific to ray data? is there documentation about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YAML file format is specific to python (more docs here). We just directly load the yaml with something like

    with open(config_path) as file:
        config = yaml.safe_load(file)
    logging.config.dictConfig(config)

Probably could do a better job explaining that this could be used in the brief logging docs here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment in this file explaining the origin/format of this YAML file?

Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocker for me, but we should probably add some tests to ensure the behavior, and follow up with product and doc changes.

Also, maybe out of scope, but right now everything is configured through env var. Does it make sense to create a logging config for users to do other configurations such as log level or log file locations?

python/ray/data/_internal/logging_json.yaml Outdated Show resolved Hide resolved
@omatthew98
Copy link
Contributor Author

No blocker for me, but we should probably add some tests to ensure the behavior, and follow up with product and doc changes.

Also, maybe out of scope, but right now everything is configured through env var. Does it make sense to create a logging config for users to do other configurations such as log level or log file locations?

Users could create their own logging.yaml configuration (which could be forked off either of the default examples we have for TEXT or JSON), and specify the path to be used instead of the default. Does that make sense as a workflow or is there something that would be preferred?

@GeneDer
Copy link
Contributor

GeneDer commented Sep 9, 2024

No blocker for me, but we should probably add some tests to ensure the behavior, and follow up with product and doc changes.
Also, maybe out of scope, but right now everything is configured through env var. Does it make sense to create a logging config for users to do other configurations such as log level or log file locations?

Users could create their own logging.yaml configuration (which could be forked off either of the default examples we have for TEXT or JSON), and specify the path to be used instead of the default. Does that make sense as a workflow or is there something that would be preferred?

Yep, understand there are essentially unlimited degrees of freedom for users to define their own yaml, but then they would have to specify everything on their own right. Just wondering if here are something in between if user just need a small tweak. But totally understand if there are no such usecase for ray data, no need to over engineer this.

Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@omatthew98
Copy link
Contributor Author

Yep, understand there are essentially unlimited degrees of freedom for users to define their own yaml, but then they would have to specify everything on their own right. Just wondering if here are something in between if user just need a small tweak. But totally understand if there are no such usecase for ray data, no need to over engineer this.

I think to keep things simple will leave as is. My understanding is that the vast majority of users will use the default settings (maybe not even JSON file logging) and those who do need something custom will likely have the expertise to set things up in whatever way they desire. Can revisit in the future if this becomes a pain point.

@omatthew98 omatthew98 added the go add ONLY when ready to merge, run all tests label Sep 10, 2024
os.path.join(os.path.dirname(__file__), "logging_json.yaml")
)

# Environment variable to specify the encoding of the log messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also comment what options are available?

Comment on lines 103 to 100
environment variable. If the variable isn't set, this function loads the
"logging.yaml" file that is adjacent to this module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring is slightly out-of-date with the code

@@ -0,0 +1,34 @@
version: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment in this file explaining the origin/format of this YAML file?

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @omatthew98, based on the logs in the PR description, it seems that the core context is not being added to the logs in ray-data.log. The main use case for structured logging is to enable users to associate Job ID, Actor ID, and Task ID with the logs.

@kevin85421 kevin85421 self-assigned this Oct 4, 2024
@hongpeng-guo
Copy link
Contributor

hongpeng-guo commented Oct 7, 2024

Hi @omatthew98, based on the logs in the PR description, it seems that the core context is not being added to the logs in ray-data.log. The main use case for structured logging is to enable users to associate Job ID, Actor ID, and Task ID with the logs.

cc @omatthew98 I also met this problem when I was doing the train structured logging. I think the fix is to use (): key instead of the class: key when register the core context filter in logging.yaml. A reference online discussion here: https://stackoverflow.com/questions/61456543/logging-in-python-with-yaml-and-filter


filters:
console_filter:
(): ray.data._internal.logging.HiddenRecordFilter
core_context_filter:
class: ray._private.ray_logging.filters.CoreContextFilter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(): ray._private.ray_logging.filters.CoreContextFilter will enable this core context filter, I think.

@omatthew98
Copy link
Contributor Author

Hi @omatthew98, based on the logs in the PR description, it seems that the core context is not being added to the logs in ray-data.log. The main use case for structured logging is to enable users to associate Job ID, Actor ID, and Task ID with the logs.

@kevin85421 I updated this PR with the suggestion from @hongpeng-guo and it now seems to be including those tags. The example above should be updated if you want to confirm.

Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

class: ray.data._internal.logging.SessionFileHandler
formatter: ray_json
filename: ray-data.log
filters: [core_context_filter]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to the core context filter for the default (text) log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The core context filter doesn't actually filter anything out, it just adds metadata to the logs but for text logs this is not shown.

Comment on lines 102 to 103
If "RAY_DATA_LOG_ENCODING" is specified as "JSON" we will enable JSON reading mode
if using the default logging config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about what JSON reading mode refers to -- does this mean JSON log format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it should be JSON logging mode not JSON reading.

Comment on lines 14 to 18
# Env. variable to specify the encoding of the file logs when using the default config.
RAY_DATA_LOG_ENCODING = "RAY_DATA_LOG_ENCODING"

# Env. variable to specify the logging config path use defaults if not set
RAY_DATA_LOGGING_CONFIG_PATH = "RAY_DATA_LOGGING_CONFIG_PATH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe rename to make it clear these refer to environment variable names and not the values? I'd think RAY_DATA_LOGGING_CONFIG_PATH refers to the actual path and not the name of an environment variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to suffix both of these with _ENV_VAR_NAME to be super explicit.

# After configuring logger, warn if RAY_DATA_LOGGING_CONFIG_PATH is used with
# RAY_DATA_LOG_ENCODING, because they are not both supported together.
if config_path is not None and log_encoding is not None:
logger = logging.getLogger("ray.data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What's the motivation for using the parent logger rather than this module's logger? Feel like it's more conventional to do logging.getLogger(__name__) unless there's a strong reason to do otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think there is a solid reason to use the parent logger, will switch to the module logger.

Comment on lines 130 to 133
logger.warning(
"Using `RAY_DATA_LOG_ENCODING` is not supported with "
+ "`RAY_DATA_LOGGING_CONFIG_PATH`"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If these aren't supported, should we just error? Feel like it might be confusing if allow unsupported combinations (e.g., what's the behavior if you do this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they specify RAY_DATA_LOGGING_CONFIG, then this will be used exactly as the logging configuration. If they try to use RAY_DATA_LOG_ENCODING alongside this, this env var will essentially be ignored. My thought was if the user is using a custom configuration that should be the expected behavior. This warning is meant to tell them that this env var doesn't magically make JSON logging work, but they should still get logs exactly how they have them configured in the custom config

Comment on lines 111 to 113
# Dynamically load env vars
config_path = os.environ.get("RAY_DATA_LOGGING_CONFIG_PATH")
log_encoding = os.environ.get("RAY_DATA_LOG_ENCODING")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding, what happens if we don't load these dynamically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference would be if you were editing the environment variables and then recalled configure_logging, this would pick up the changes vs. loading them once on module initialization would not. In our tests we are doing something similar where we modify the variables and then call configure_logging so this works better with that, but in general this will just ensure the most recent environment variable values are used.

@omatthew98 omatthew98 force-pushed the mowen/ray-data-structured-logging branch from 4f35375 to b4e0e6a Compare October 11, 2024 20:22
@bveeramani bveeramani merged commit 04098a6 into ray-project:master Oct 11, 2024
5 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Adds structured logging to Ray Data. This will allow users to configure
logging to use any of the following:
* A user's custom logging file (existing functionality)
* A default TEXT logger (existing functionality)
* A default JSON logger (new functionality)

---------

Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Co-authored-by: Scott Lee <[email protected]>
Co-authored-by: Hao Chen <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants