Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orchagent core dump when executing with -s option (syncMode) #1014

Merged
merged 1 commit into from
Aug 5, 2019

Conversation

dzhangalibaba
Copy link
Collaborator

  • before changes:
    • orchagent create_switch first, which is using redis_generic_create() to write ASIC_DB, and syncd will handle this request, since syncd is in syncMode, a response with op "getresponse" is send via internal_syncd_api_send_response().
    • but in orchagent , g_syncMode is not set yet at this moment, so there is no internal_api_wait_for_response() waiting the response generated by internal_syncd_api_send_response()
    • later internal_redis_generic_get() is invoked and got the response which is not for it. That response cause get_switch_attr into segfault and orchagent exit

Aug 1 01:58:34.926936 ASW-7005 NOTICE swss#orchagent: :- clear_local_state: clearing local state
Aug 1 01:58:34.927130 ASW-7005 NOTICE swss#orchagent: :- initSaiRedis: Notify syncd INIT_VIEW
Aug 1 01:58:34.928860 ASW-7005 NOTICE swss#orchagent: :- redis_get_free_switch_id_index: got new switch index 0x0
Aug 1 01:58:34.932105 ASW-7005 NOTICE swss#orchagent: :- main: Create a switch
Aug 1 01:58:34.932105 ASW-7005 NOTICE swss#orchagent: :- redis_set_switch_attribute: disabling buffered pipeline in sync mode
Aug 1 01:58:39.280930 ASW-7005 INFO swss#supervisord 2019-08-01 01:58:30,525 INFO spawned: 'orchagent' with pid 76
Aug 1 01:58:39.280930 ASW-7005 INFO swss#supervisord 2019-08-01 01:58:31,527 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 1 01:58:42.888892 ASW-7005 INFO swss#orchagent: :- internal_redis_generic_get: response: op = SAI_STATUS_SUCCESS, key = getresponse
Aug 1 01:58:42.891224 ASW-7005 INFO kernel: [ 1368.445943] orchagent[15338]: segfault at 0 ip 00007f62f0da45c5 sp 00007ffe656b6ff0 error 4 in libsaimetadata.so.0.0.0[7f62f0d7a000+56000]
Aug 1 01:58:43.012191 ASW-7005 INFO swss#supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...

  • after the changes:
    • set g_syncMode first before create_switch
    • internal_api_wait_for_response() will be there when using redis_generic_create() to consume the response generated via internal_syncd_api_send_response
    • later, internal_redis_generic_get() got its expected response

Aug 1 08:02:03.809155 ASW-7005 NOTICE swss#orchagent: :- sai_redis_internal_notify_syncd: notify response: SAI_STATUS_SUCCESS
Aug 1 08:02:03.809503 ASW-7005 NOTICE swss#orchagent: :- sai_redis_notify_syncd: notify syncd succeeded
Aug 1 08:02:03.809503 ASW-7005 NOTICE swss#orchagent: :- sai_redis_notify_syncd: clearing current local state since init view is
called on initialized switch
Aug 1 08:02:03.809503 ASW-7005 NOTICE swss#orchagent: :- clear_local_state: clearing local state
Aug 1 08:02:03.809519 ASW-7005 NOTICE swss#orchagent: :- initSaiRedis: Notify syncd INIT_VIEW
Aug 1 08:02:03.809519 ASW-7005 NOTICE swss#orchagent: :- redis_set_switch_attribute: disabling buffered pipeline in sync mode
Aug 1 08:02:03.809542 ASW-7005 NOTICE swss#orchagent: :- redis_get_free_switch_id_index: got new switch index 0x0
Aug 1 08:02:03.809876 ASW-7005 INFO syncd#syncd: :- check_notifications_pointers: SAI_SWITCH_ATTR_FDB_EVENT_NOTIFY: 0x55D912F9389
0 (orch) => 0x55E2CF67E520 (syncd)
Aug 1 08:02:03.809910 ASW-7005 INFO syncd#syncd: :- check_notifications_pointers: SAI_SWITCH_ATTR_PORT_STATE_CHANGE_NOTIFY: 0x55D
912F938A0 (orch) => 0x55E2CF67E7E0 (syncd)
Aug 1 08:02:03.809910 ASW-7005 INFO syncd#syncd: :- check_notifications_pointers: SAI_SWITCH_ATTR_SWITCH_SHUTDOWN_REQUEST_NOTIFY:
0x55D912F938B0 (orch) => 0x55E2CF67E940 (syncd)
Aug 1 08:02:03.810736 ASW-7005 INFO swss#orchagent: :- internal_api_wait_for_response: waiting for response 0
Aug 1 08:02:03.810736 ASW-7005 INFO swss#orchagent: :- internal_api_wait_for_response: wait for 0 api response
Aug 1 08:02:08.372356 ASW-7005 INFO swss#supervisord 2019-08-01 08:01:59,642 INFO spawned: 'orchagent' with pid 77
Aug 1 08:02:08.372356 ASW-7005 INFO swss#supervisord 2019-08-01 08:02:00,644 INFO success: orchagent entered RUNNING state, proce
ss has stayed up for > than 1 seconds (startsecs)
Aug 1 08:02:11.835573 ASW-7005 INFO swss#orchagent: :- internal_api_wait_for_response: response: op = SAI_STATUS_SUCCESS, key = g
etresponse
Aug 1 08:02:11.835717 ASW-7005 NOTICE swss#orchagent: :- main: Create a switch
Aug 1 08:02:11.836461 ASW-7005 INFO swss#orchagent: :- internal_redis_generic_get: response: op = SAI_STATUS_SUCCESS, key = getre
sponse
Aug 1 08:02:11.836618 ASW-7005 INFO swss#orchagent: :- meta_generic_validation_post_get_objlist: SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_
ROUTER_ID:SAI_ATTR_VALUE_TYPE_OBJECT_ID returned get object on list [0] oid 0x3000000000024 object type 3 does not exists in local
DB (snoop)
Aug 1 08:02:11.836618 ASW-7005 NOTICE swss#orchagent: :- main: Get switch virtual router ID 3000000000024

@dzhangalibaba
Copy link
Collaborator Author

this issue is a must seen issue. Looks the merged syncMode commit is not tested

@lguohan lguohan requested a review from kcudnik August 1, 2019 15:03
@lguohan
Copy link
Contributor

lguohan commented Aug 1, 2019

retest this please

orchagent/main.cpp Show resolved Hide resolved
@dzhangalibaba
Copy link
Collaborator Author

looks the vs test itself has some issues?

@kcudnik
Copy link
Contributor

kcudnik commented Aug 2, 2019

yes seems like it, @lguohan have you seen this before? i saw other issues with tests before not related to actual PR, all of the error are:
E ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

AHH it maybe related to the reboot of test bed today

@kcudnik
Copy link
Contributor

kcudnik commented Aug 2, 2019

retest this please

3 similar comments
@lguohan
Copy link
Contributor

lguohan commented Aug 2, 2019

retest this please

@lguohan
Copy link
Contributor

lguohan commented Aug 3, 2019

retest this please

@stcheng
Copy link
Contributor

stcheng commented Aug 5, 2019

retest this please

@jleveque
Copy link
Contributor

jleveque commented Aug 5, 2019

Retest this please

@stcheng stcheng merged commit 264e548 into sonic-net:master Aug 5, 2019
@dzhangalibaba dzhangalibaba deleted the swss_syncMode branch September 27, 2019 23:03
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
The original fan status can be one of "OK", "Not OK", "N/A". This PR allows a new fan status "Updating".
If fan status is not "true" or "false", display the status field value in CLI output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants