Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor for new cloudflare requirements #95

Merged
merged 6 commits into from
Oct 29, 2024

Conversation

seankim658
Copy link

Refactors to use the cloudscraper library instead of mechanicalsoup. Fixes issue #93.

@seankim658
Copy link
Author

I just ran the tests and got 3 failures out of the 24 test cases. Let me look into that quick and I'll make a new commit to correct whatever went wrong.

@seankim658
Copy link
Author

Ok passes all test cases now.

@j-andrews7
Copy link
Owner

Generally looks pretty good to me, I'll kick the wheels a bit further when I get a chance. My one worry is that it (unsurprisingly) does away with some of the Cloudflare interception handling to let the user know what's going wrong.

...Of course all of that does nobody any good if there's no way past them anyway, which seems like it might currently be the case. And we can always add some of that back in the future.

@esqew may have some additional thoughts.

@seankim658
Copy link
Author

I realized I didn't include this info in the issue but the Cloudflare JS challenge I was running into in #93 wasn't being caught by the interception handling. The login function was failing here:

Traceback (most recent call last):
  File "/home/seank/projects/personal/kenpom-upstream/kenpompy/main.py", line 8, in <module>
    scraper = login(username, password)
  File "/home/seank/projects/personal/kenpom-upstream/kenpompy/kenpompy/utils.py", line 36, in login
    browser.select_form('form[action="handlers/login_handler.php"]')
  File "/home/seank/.local/lib/python3.10/site-packages/mechanicalsoup/stateful_browser.py", line 241, in select_form
    raise LinkNotFoundError()
mechanicalsoup.utils.LinkNotFoundError

So the Cloudflare JS detection HTML was being returned in front of the kenpom home page and then the select_form call was failing.

The HTML returned indicated the challenge-error-text to enable Javascript:

<!DOCTYPE html>
<html lang="en-US">
    <head>
        <title>Just a moment...</title>
        <meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
        <meta content="IE=Edge" http-equiv="X-UA-Compatible" />
        <meta content="noindex,nofollow" name="robots" />
        <meta content="width=device-width,initial-scale=1" name="viewport" />
        <style>
            * {
                box-sizing: border-box;
                margin: 0;
                padding: 0;
            }
            html {
                line-height: 1.15;
                -webkit-text-size-adjust: 100%;
                color: #313131;
                font-family: system-ui, -apple-system, BlinkMacSystemFont, Segoe UI, Roboto, Helvetica Neue, Arial, Noto Sans, sans-serif, Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol, Noto Color Emoji;
            }
            body {
                display: flex;
                flex-direction: column;
                height: 100vh;
                min-height: 100vh;
            }
            .main-content {
                margin: 8rem auto;
                max-width: 60rem;
                padding-left: 1.5rem;
            }
            @media (width <= 720px) {
                .main-content {
                    margin-top: 4rem;
                }
            }
            .h2 {
                font-size: 1.5rem;
                font-weight: 500;
                line-height: 2.25rem;
            }
            @media (width <= 720px) {
                .h2 {
                    font-size: 1.25rem;
                    line-height: 1.5rem;
                }
            }
            #challenge-error-text {
                background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI0IyMEYwMyIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjQjIwRjAzIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+);
                background-repeat: no-repeat;
                background-size: contain;
                padding-left: 34px;
            }
            @media (prefers-color-scheme: dark) {
                body {
                    background-color: #222;
                    color: #d9d9d9;
                }
            }
        </style>
        <meta content="390" http-equiv="refresh" />
    </head>
    <body class="no-js">
        <div class="main-wrapper" role="main">
            <div class="main-content">
                <noscript>
                    <div class="h2"><span id="challenge-error-text">Enable JavaScript and cookies to continue</span></div>
                </noscript>
            </div>
        </div>
        <script>
            (function () {
                window._cf_chl_opt = {
                    cvId: "3",
                    cZone: "kenpom.com",
                    cType: "managed",
                    cRay: "8d841b98afcc0650",
                    cH: "T_1E.n3BlTUPaWeR77C4iA5VLxJW_GzGiNYj8PkU2RE-1729879243-1.2.1.1-Eh9hYE8REn2iGpR7aKdPJOAjd9tkzHyN8ZvqPi2XJM1iO9vzKWs6tHY9pEdP.0Gk",
                    cUPMDTk: "\/index.php?__cf_chl_tk=BUZVra4deaH_JcZ4B0YBDzMgl82Cy1pgw0rcIGkKlFI-1729879243-1.0.1.1-7DuWua2qk_4BRNqhajYekpxfhY3qrdz9GTCphNZww3E",
                    cFPWv: "b",
                    cITimeS: "1729879243",
                    cTTimeMs: "1000",
                    cMTimeMs: "390000",
                    cTplV: 5,
                    cTplB: "cf",
                    cK: "",
                    fa: "\/index.php?__cf_chl_f_tk=BUZVra4deaH_JcZ4B0YBDzMgl82Cy1pgw0rcIGkKlFI-1729879243-1.0.1.1-7DuWua2qk_4BRNqhajYekpxfhY3qrdz9GTCphNZww3E",
                    md:
                        "bWf7BwCOJJsjIpVmb3yoY5GNweGi17ilxtGZxhT8zis-1729879243-1.2.1.1-bzn3vH.0VTcwhxQEHn.3hbBNmFhQ6d.9eeG2YooSc4vQWBRULQQY2iqWLZJGH7bLNjcPLSmbX8oM0gyROGCvui26J.CeivFroIA9xLxzwHUqcslRCKmHhsVAD_UiZpQDctv5va0JKh8h6JfkcTMyk7c9u4P5bapI78_8u1DdnGN6ZpiLnqg7oti8ORdHPRLoat63Fc77KlSHJrP4eVYL3wtk8sVxYA983fk9hsOM5f1_hg.z4TkuyZ2bWE9esM24ouV_IV.V7k7vxGfX0BsY8SFVBfpvFyzkDCuae3j9mbwzDG0ZQlN.h3HqoE3h56Ud.XYS8xlwML3lVQ_sDKof2YeekrEOTAeWakBtDCDm5JaYhZi9RlHfealM5uKqgxfs2.oV3DIbaBN6L6IMlrzm46v9XdL1bkqtTftK_lv4HLp7ONyBxMKNm36BBVQc_KgHMQqbsZYL27uBWbqOU4a7Ozi.TtakdFpb02aImifPkrxrzHruE23B5thSnOEWcb2fQauW8EQvX1_CTBwipW6KfsoOxhMVWp.1dYhs922ohFYK_FLgKVXl6csZ5M.kEP4xNt0bFZ5kgmeNkFWVqJuUbqk0WKDFk3UymA4xYcJGxag34Wmx7Jz2ZMLZEVwtgJ26mjSWZBBGnMCO6bpiuAOSPXlF4UumStfdZBQ3fyd4PxtLNSctGQPgAF6otQQJkG_7UJ0RM3bvbn.czTx_KMHMox2uYHX4H0Ue_9a3OKjjznEAHmn0BA91UkqQg5SNYS2752Yed0_TkeEBlAk4.zhePOtK999ZlH8QSJcDWJDOtVdsLaB6eT_q.DpDION1wb1yD4bthm_At.iWWS.Ij8o549AlhZ1xVG4v6mqnCCDe.WwDnG69N7RQXD1s1m2ApSEqvFmFJ27U_6YEk.opnOd_3H_YjLbNCHXoOhnrL4Foh7.XCPHnGxqRoysXb7ABS95qL94h.w7DbJsKlKj1PT2f1FqaRGEf1.RM9YYjnJoZEYE98ufvpQqW4ncAKrvo2QKY1S_EH1af30pIXRZbtC6NRuAjqNP7TMM.L_QOYDkZ56JVqygZFUS_9AV9pIZp6mMcUcHLwlalbKwPSvo77YJO8w3s3GGZ05VMcBEwea1Mc_eYIUey57MyZNoskjs84i7xwKqBdnOn.uk3bcEORssrMiLLu8z2qO2R98Sq9E20VdBM52mDKete_Ve93tH7E4FQ048C1MogrY6FckQTFwZi0yc0VumLU_FD_sCuMgDyRglTjckQ3oEroenDhzz3rr9C.Io3PGZwHAhpG75v1YYL0NOoPuWQn86w71UaNQ_1kCQpgz4nlHOacwQR5oZuypv1eXMOmJyJjHL12Z.X7RGKeaBHsnGGd4bZpThoW47cmefJM4_BAggs4gZnP3KQTjH.hhMo9PlN4iEwNgzTkcAUVZ9Ho_DccAjHf1kmhbFqtqodOcw5de5Me1N2m8JLIOdnjyP__HE5RQ3mKdO0Wnl0W08Kz.cKGFQebXyJ_h2Dt4jQ41ETjFPzW5Q0c2BmM7NSxSaWNFL5FA01xR07X3zs5zLYyarvrPPATl0rMfXtuJLBqu3NNTmvqZjAp5GdZTGPOTQvLHckMrePh9fyJAccIrCQJkS1yEVgzaVRC9E0T6wDdGHCk8ioRKbh_0owOkMiPAqH29qt5afV2xLY7naUJAJPYZjtTckEmli1NB2pdkjM5mLyHorlnRRHOn7PY_1n3_O2zzpuGByi.v0GqsZIcr6lYfV8mw3nWQQa5tPtJP5asDpj_O.I7pT5osuPU9o2eU_wos1wq10okYEVMt4CW8I0hoZxnGlFiMKdVYS1_OThKTiRagRZ3Fippm7pBgZQMN2xg_uTxnuWzeRcmxm0NFESZ0Hp.sERjPet9_zWbsZOUN_NA8Vn_p8YzLix9lAzlIJjneFYckwsz_KaANqbSfXgHW7nncNjCrgFsIkD8HyY6Fnxa74E7GJIzz5R8LVXaZqOzozci_x1QA4C3pX9ckwDpoRaFAp_ZtObR0t5CfYR6PR.LM9WpKNgwtjQb2z1KVHWnUejCIeN0glZFdAGHmlahzTsJrxooy4fBZ2Xz106ObyePUkPrOckky5t3Rwe9Osx6ESBHSCNdjxW.gf9pcQSlTyJyioVOi82tBRSNesuXgY3qXOlTaTZtsMhBYoNIl7hwuUG9Vp39XL1AMaYRA",
                    mdrd:
                        "thZe5pKCvBkMZq9Ow_naSE..nky8ZGAFvygi9IO4Yvo-1729879243-1.2.1.1-hPCirwZKUhXnaWj5lH43K8h5sU4w5RBMjt5h9igoG2Qst_q6l8K4b2RqLa7v8GypM9N0S.FRZA1a5vXdRBuZJ2yaVGWA7jhDnbA.9.lacBe0_qGoNQYk1aSjOTmaTsHZrgcuDaPfBWstsCeRfaH_wRDoSDcCv00N4WwOSWTbJU4tMoOGYkel1wk._ax4rGtQv5quvH5cg0UPOhE7A7NFMaBjTE9FS6iTlTr9U9wrONpF7JhBbXrT4Yr940eqZRL.sqqticT4yizKONpULzODe4RAlWvo70yyMlGnyNblBHTgt6312OAzw8g671PUVBDmdAAUS1ystjor9KeH7liUqYSfwO8RcPzszjYJZLQfTCA57Nj.1XVFExqw.fBnomeCcg6KFX97oxYe1R2tc8qDaJYA3WKeGWCoT3V_qmc4JDfOeRdeP6hJYhZIJSVeT8t._tcBmYQPbnY0N4to4mc3G4_T7qK.qu6DyogYfViFT9evSAyk57._2vEEvi0wsP.ECCJrtpiNspuOyxS4jepzjUTY.HgTBI842mN_jX_06Oauag_8AylYFh63jypUbritD9gwKvnPmWtfjSLIlZIAiOKlA_mMZeF5imdL.GepxB7FCmA7XgKkTaEWfwZN3qCg6exb9rTFp4ey2Ia1LDNUC2wCHpXsk3lLIO9ZJqYG7VA0GTKVH0glBVdcp8GoyAfr8qSeqk0B86HP9sPCsiqcRteBxYYnopUrAiR54XdvSi5pyji4mfaOem.YuftpKi1AW3O93p7J_IOC.YFLJ5DdIJ4dZJskyfZNiT2hwjtJl7ujO1h71lZGR8jKXNJFuripZDPE198kYQNhpy5CMrNRbB6TuRBYujAlBVjDYrpBwT4fSeomC3Cfneywhh0aVWtx6EsHbFLpAm0UxiuWOHBAEAUFghUU9.JYNUPWKdORqBUrU5AP_hccl1shEnGSFXg.RqOvtX5yoZu828Pn3YKQxVU2QBm0LUrdwu4sYKDlodoT65moJeRQ2ynqST8aGrpgAaun9lHvkxJ85Cg5a.aic3J3JZLqv3K3QpkuQ9dxmAsj8PDJBEotayzTi44txPJOAUOyqIV_1vNmXTVSVHisQuY8hZB4ip4jNBJvWTz7_5NEXQTjmpWy6omTkBxBCoBWVLABYKXmoD21dKEI55pF7atRF3CuV7kDEG9PCyxQbhdgLRlSM3fbMtNlCXo0jcNTEDyP9NzAz0fjjmQVuwzp.b5b55W3ZUOS6W6ZIQ4zHypoMRDe6BGGS_XfeUCpfq70iwSEbk_xmtOjQkKt5mgUwVRDzOjLSzDl4gG2rVrQEWdaRybwhK8SoVOFbVJ00bkgPpxGI56lqNnxqqgkpcbf0W_kZRQghYBIPvgAT9HW9AuFOIkVe03gYCrE3gv.LzCuV8tyOrU3h4MPf5pRmIIP_wFyoTvDoNSWuoL5Vz.3HHRJLw6sSlXYJnuWUYgmzy7JzVhY9MvCILIpU61n.VIYMIj5M2sO1aJ0J5ZxdURUcCzfWQWQdeC.AjMUKFk_ahPC7griNrZg6WT5L3bk7I7xYEpJrqMh35xXUJhFjA3eoOlBDBmNhv8WUInOuOLSprpaj4ywyUIdBe5prmm2MbZOt.YXtBc2TZpAVdDa_VGzZ817CkENLU9qaxACZ0e9HxeaZTg_m6dW7UuXeqmUG5r4mqT_wgvEGMoaw2DNMXgumt5BjCgFPjWGfBC8GueqnfwyYked6yfG7OFwVl3dJduV3yEv894rL_UKxZRVSBQrx4ciwH1YGWLn8YfsM99MDx0u8tWTbKeIqa1sSY2x4Zjda4MPPJJnfbkOkcKsBByY6dduLQe_KituEJsidxMV.SJMmwz9QLVBIIHi2NbIb.7hj6qY.g0TEEOtUb0dA0EsTLocchmIQnr4zFx4VGFsNZ3TjYy.WLB14qqnK_mpO6pVnh4v7wZYqhY3T.ERm3jtXimja.EEtf_f0Ba3.7zQywdV6BaZF50OK0QrtZ_DfEQiaGAWqG401IXm_42eo5.BfWCxz2nhAjqgsyWOnh29jLAi7U2ubcstveSVRBDlCac6f5A1_H4pTVH8p7eckui3iEVbFyvWcs3I8teq9QCKD1Cs5P2wMCJ0u4EWILAVnUCxC8MZKpDZAhZ9kZMpKhLRpbs62XB9iFEzGcZknuS0p33iupXcjtueui5fs9qmskzjyg",
                };
                var cpo = document.createElement("script");
                cpo.src = "/cdn-cgi/challenge-platform/h/b/orchestrate/chl_page/v1?ray=8d841b98afcc0650";
                window._cf_chl_opt.cOgUHash = location.hash === "" && location.href.indexOf("#") !== -1 ? "#" : location.hash;
                window._cf_chl_opt.cOgUQuery = location.search === "" && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf("?") !== -1 ? "?" : location.search;
                if (window.history && window.history.replaceState) {
                    var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
                    history.replaceState(null, null, "\/index.php?__cf_chl_rt_tk=BUZVra4deaH_JcZ4B0YBDzMgl82Cy1pgw0rcIGkKlFI-1729879243-1.0.1.1-7DuWua2qk_4BRNqhajYekpxfhY3qrdz9GTCphNZww3E" + window._cf_chl_opt.cOgUHash);
                    cpo.onload = function () {
                        history.replaceState(null, null, ogU);
                    };
                }
                document.getElementsByTagName("head")[0].appendChild(cpo);
            })();
        </script>
    </body>
</html>

@esqew
Copy link
Collaborator

esqew commented Oct 25, 2024

Thanks for this PR! I'd like to take a bit of a deeper dive on the issue itself and this PR itself before approving, but preliminarily I don’t have an issue with this.

@j-andrews7 j-andrews7 changed the base branch from master to v0.4.0 October 28, 2024 02:28
@j-andrews7
Copy link
Owner

Went ahead and rebased this to merge into a new v0.4.0 branch since its a bit more of a fundamental change. If we get this rolled in, fix #92, #94, and #90, I'll be pretty happy with it and push a new version to pypi.

@seankim658
Copy link
Author

seankim658 commented Oct 29, 2024

Once this is rolled into the v0.4.0, happy to open a PR for #92. I can also take care of #90.

In building the sphinx docs locally, everything worked fine so not sure without some more info on what is going wrong there to debug.

@seankim658
Copy link
Author

Just tried re-running the test cases locally and they passed. One thing I remember is that over the Summer I noticed that when running inside any type of non-local environment I would get blocked. I know for sure trying to run the login function from inside a docker container gets blocked so wondering if this is a similar issue. I can look into it this week and at the very least, add some interception handling for a more descriptive error message.

@j-andrews7
Copy link
Owner

I believe the issue in this case is that the repo secrets for actions are blocked as this is being run from a fork, and Github blocks the secrets being exposed or used for security reasons.

Regardless, I am going to merge this for now so we can deal with the other issues.

@j-andrews7 j-andrews7 merged commit b3356b6 into j-andrews7:v0.4.0 Oct 29, 2024
0 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants