Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.16.0 is crashing: connmgr: concurrent map iteration and map write #9374

Closed
3 tasks done
hsn10 opened this issue Oct 27, 2022 · 6 comments
Closed
3 tasks done

0.16.0 is crashing: connmgr: concurrent map iteration and map write #9374

hsn10 opened this issue Oct 27, 2022 · 6 comments
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization

Comments

@hsn10
Copy link

hsn10 commented Oct 27, 2022

Checklist

Installation method

ipfs-update or dist.ipfs.tech

Version

Kubo version: 0.16.0
Repo version: 12
System version: amd64/windows
Golang version: go1.19.1

Config

"Datastore": {
    "BloomFilterSize": 2097152,
    "GCPeriod": "24h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 99,
    "StorageMax": "40GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 30
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": null,
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {},
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": "gossipsub"
  },
  "Reprovider": {
    "Interval": "18h",
    "Strategy": "all"
  },
  "Routing": {
    "Methods": null,
    "Routers": null,
    "Type": "dhtserver"
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/127.0.0.1/ipcidr/8",
      "/ip4/169.254.0.0/ipcidr/16"
    ],
    "ConnMgr": {
      "GracePeriod": "30s",
      "HighWater": 800,
      "LowWater": 600,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

Every few hours after upgrade from 0.15 to 0.16 is IPFS crashing with messages like this:

https://controlc.com/71120ea0

There is probably something more meaningful at the start but I can't get it. Its outside of my terminal scroll back buffer. Any tips how to log errors to file?

@hsn10 hsn10 added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Oct 27, 2022
@pbtrung
Copy link

pbtrung commented Oct 31, 2022

I also got the error.
At the very beginning, it showed:

WebUI: http://127.0.0.1:5001/webui
Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Daemon is ready
fatal error: concurrent map iteration and map write

goroutine 222 [running]:
github.com/libp2p/go-libp2p/p2p/net/connmgr.peerInfos.SortByValueAndStreams.func1.1(0xc0179d00f0?)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:267 +0x6a
github.com/libp2p/go-libp2p/p2p/net/connmgr.peerInfos.SortByValueAndStreams.func1(0x16b?, 0x21b?)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:277 +0x127
sort.partition_func({0xc0119d9370?, 0xc003e967e0?}, 0x0, 0x2d4, 0x32?)
        sort/zsortfunc.go:157 +0x191

(skip long error messages here)

@MarcoPolo
Copy link
Contributor

@pbtrung do you have the full log handy? I'm curious if we can see what else was accessing the map.

@p-shahi
Copy link

p-shahi commented Oct 31, 2022

There is probably something more meaningful at the start but I can't get it. Its outside of my terminal scroll back buffer. Any tips how to log errors to file?

@hsn10
I think you Kubo uses go-log and you can set these environment variables GOLOG_FILE and GOLOG_OUTPUT to capture your log output to a file
or something like kubo [arguments...] 2>&1 | tee log.txt
@Jorropo can confirm or deny this

@pbtrung
Copy link

pbtrung commented Nov 1, 2022

@pbtrung do you have the full log handy? I'm curious if we can see what else was accessing the map.

I used nohup ipfs daemon &, but the log was deleted when I downgraded the version.

@BigLep BigLep mentioned this issue Nov 7, 2022
@cachalots
Copy link

@pbtrung do you have the full log handy? I'm curious if we can see what else was accessing the map.

The two asynchronous methods was accessing the map, it seems that this map needs to be locked.
https://github.com/libp2p/go-libp2p/blob/21dc42bd72fc6064c27a098db680e94ced5a76d6/p2p/net/connmgr/connmgr.go#L145

https://github.com/libp2p/go-libp2p/blob/21dc42bd72fc6064c27a098db680e94ced5a76d6/p2p/net/connmgr/connmgr.go#L138

fatal error: concurrent map iteration and map write
goroutine 219 [running]:
github.com/libp2p/go-libp2p/p2p/net/connmgr.peerInfos.SortByValueAndStreams.func1.1(0x4051c75140?)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:267 +0xc8
github.com/libp2p/go-libp2p/p2p/net/connmgr.peerInfos.SortByValueAndStreams.func1(0x5?, 0x180?)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:277 +0xe0
sort.partition_func({0x400ac09350?, 0x4014dc1da0?}, 0x0, 0x263, 0x400ac09278?)
        sort/zsortfunc.go:157 +0x1d0
sort.pdqsort_func({0x400ac09350?, 0x4014dc1da0?}, 0x0?, 0x20000?, 0x400ac09328?)
        sort/zsortfunc.go:114 +0x1b0
sort.Slice({0x121b5e0, 0x400ea346f0}, 0x663?)
        sort/slice.go:23 +0x94
github.com/libp2p/go-libp2p/p2p/net/connmgr.peerInfos.SortByValueAndStreams({0x4050dea000, 0x263, 0x663}, 0x0)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:256 +0x78
github.com/libp2p/go-libp2p/p2p/net/connmgr.(*BasicConnMgr).getConnsToClose(0x40009c7600)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:468 +0x398
github.com/libp2p/go-libp2p/p2p/net/connmgr.(*BasicConnMgr).trim(0x4038b3df78?)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:353 +0x20
github.com/libp2p/go-libp2p/p2p/net/connmgr.(*BasicConnMgr).background(0x40009c7600)
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:332 +0x134
created by github.com/libp2p/go-libp2p/p2p/net/connmgr.NewConnManager
        github.com/libp2p/[email protected]/p2p/net/connmgr/connmgr.go:145 +0x374
goroutine 1 [chan receive, 3634 minutes]:
main.daemonFunc(0x4000be64d0, {0x1a764?, 0xffff6a10ea08?}, {0x1320f40?, 0x4000012190})
        github.com/ipfs/[email protected]/cmd/ipfs/daemon.go:591 +0x1e9c
github.com/ipfs/go-ipfs-cmds.(*executor).Execute(0x19ebba0?, 0x4000be64d0, {0xffff6a10ea08, 0x4000d766c0}, {0x1320f40, 0x4000012190})
        github.com/ipfs/[email protected]/executor.go:88 +0xdc
github.com/ipfs/go-ipfs-cmds/cli.Run({0x19fc6f8?, 0x4000019940?}, 0x29aa2a0, {0x4000118020, 0x2, 0x2}, 0x4000120e98?, 0x2b0bc?, 0x4000138010, 0x17b4588, ...)
        github.com/ipfs/[email protected]/cli/run.go:137 +0x6fc
main.mainRet()
        github.com/ipfs/[email protected]/cmd/ipfs/main.go:177 +0x558
main.main()
        github.com/ipfs/[email protected]/cmd/ipfs/main.go:72 +0x1c

@lidel lidel changed the title 0.16.0 is crashing 0.16.0 is crashing: connmgr: concurrent map iteration and map write Nov 15, 2022
@lidel
Copy link
Member

lidel commented Nov 15, 2022

This was fixed in #9387 and will ship in 0.17.0-rc2 – follow #9319

@lidel lidel closed this as completed Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants