{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":55696810,"defaultBranch":"main","name":"webarchive-indexing","ownerLogin":"commoncrawl","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2016-04-07T13:28:13.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1194841?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1725930388.0","currentOid":""},"activityList":{"items":[{"before":null,"after":"48a3672474ed0c59cdbf33121dbfcb9a062a5361","ref":"refs/heads/spark-indexwarcs","pushedAt":"2024-09-10T01:06:28.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jt55401","name":"Jason Grey","path":"/jt55401","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1494409?s=80&v=4"},"commit":{"message":"Add an alternate spark version of indexwarcsjob (without mrjob)","shortMessageHtmlLink":"Add an alternate spark version of indexwarcsjob (without mrjob)"}},{"before":"efaa6f1dfcbdb5192a8c78b0993c0eade6202208","after":null,"ref":"refs/heads/redaction-wet-wat","pushedAt":"2024-06-17T18:54:59.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"jt55401","name":"Jason Grey","path":"/jt55401","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1494409?s=80&v=4"}},{"before":"a14805422c455c963ca261ea387b2580c7a0ef18","after":"69c1519b0367b9ec94bafb75841572c1608625fd","ref":"refs/heads/main","pushedAt":"2024-06-17T18:54:53.000Z","pushType":"pr_merge","commitsCount":3,"pusher":{"login":"jt55401","name":"Jason Grey","path":"/jt55401","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1494409?s=80&v=4"},"commit":{"message":"Merge pull request #6 from commoncrawl/redaction-wet-wat\n\nadded replacements for wet/wat paths so we get cdx filenames out","shortMessageHtmlLink":"Merge pull request #6 from commoncrawl/redaction-wet-wat"}},{"before":"8cff8a0ac8e04fe1f6b2609965ac7672062eff9b","after":"efaa6f1dfcbdb5192a8c78b0993c0eade6202208","ref":"refs/heads/redaction-wet-wat","pushedAt":"2024-06-17T16:05:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jt55401","name":"Jason Grey","path":"/jt55401","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1494409?s=80&v=4"},"commit":{"message":"made name matching more specific as Sebastian suggested","shortMessageHtmlLink":"made name matching more specific as Sebastian suggested"}},{"before":null,"after":"8cff8a0ac8e04fe1f6b2609965ac7672062eff9b","ref":"refs/heads/redaction-wet-wat","pushedAt":"2024-06-15T00:14:33.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jt55401","name":"Jason Grey","path":"/jt55401","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1494409?s=80&v=4"},"commit":{"message":"added replacements for wet/wat paths so we get cdx filenames out","shortMessageHtmlLink":"added replacements for wet/wat paths so we get cdx filenames out"}},{"before":"1a362dfc30e96e124da36fb4b1d1cb1dbb4b7a04","after":"a14805422c455c963ca261ea387b2580c7a0ef18","ref":"refs/heads/main","pushedAt":"2023-04-12T12:42:39.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"ZipNumClusterJob not to write temporary output to s3://commoncrawl/\n(implements #5)\n- add option --zipnum-dir defining the location (upload path)\n for ZipNum CDX files (cdx-nnnnn.gz), independent of the output\n directory (--output-dir) where part-nnnnn files are written to,\n later concatenad to the cluster.idx file\n- point --output-dir to an internal bucket holding temporary\n index data\n- remove obsolete option `--s3-upload-acl`\n- simplify index publication script (no clean-up on publication\n bucket required)","shortMessageHtmlLink":"ZipNumClusterJob not to write temporary output to s3://commoncrawl/"}},{"before":"12d5a9a014236c2030f31b02a2a8e3a776266270","after":"a14805422c455c963ca261ea387b2580c7a0ef18","ref":"refs/heads/5-zipnum-output-dir","pushedAt":"2023-04-03T20:54:52.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"ZipNumClusterJob not to write temporary output to s3://commoncrawl/\n(implements #5)\n- add option --zipnum-dir defining the location (upload path)\n for ZipNum CDX files (cdx-nnnnn.gz), independent of the output\n directory (--output-dir) where part-nnnnn files are written to,\n later concatenad to the cluster.idx file\n- point --output-dir to an internal bucket holding temporary\n index data\n- remove obsolete option `--s3-upload-acl`\n- simplify index publication script (no clean-up on publication\n bucket required)","shortMessageHtmlLink":"ZipNumClusterJob not to write temporary output to s3://commoncrawl/"}},{"before":null,"after":"12d5a9a014236c2030f31b02a2a8e3a776266270","ref":"refs/heads/5-zipnum-output-dir","pushedAt":"2023-03-31T14:48:26.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"sebastian-nagel","name":"Sebastian Nagel","path":"/sebastian-nagel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1630582?s=80&v=4"},"commit":{"message":"ZipNumClusterJob not to write temporary output to s3://commoncrawl/\n(implements #5)\n- add option --zipnum-dir defining the location (upload path)\n for ZipNum CDX files (cdx-nnnnn.gz), independent of the output\n directory (--output-dir) where part-nnnnn files are written to,\n later concatenad to the cluster.idx file\n- point --output-dir to an internal bucket holding temporary\n index data\n- remove obsolete option `--s3-upload-acl`\n- simplify index publication script (no clean-up on publication\n bucket required)\n- replace deprecated Hadoop properties","shortMessageHtmlLink":"ZipNumClusterJob not to write temporary output to s3://commoncrawl/"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0xMFQwMTowNjoyOC4wMDAwMDBazwAAAASxfzov","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0xMFQwMTowNjoyOC4wMDAwMDBazwAAAASxfzov","endCursor":"Y3Vyc29yOnYyOpK7MjAyMy0wMy0zMVQxNDo0ODoyNi4wMDAwMDBazwAAAAMPk2gC"}},"title":"Activity ยท commoncrawl/webarchive-indexing"}