Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS renderer on fly #291

Open
geshan opened this issue Jul 22, 2020 · 20 comments
Open

JS renderer on fly #291

geshan opened this issue Jul 22, 2020 · 20 comments
Labels
example An example project + readme

Comments

@geshan
Copy link

geshan commented Jul 22, 2020

I wrote a side project which I think is a great fit to try on fly.io.

JS renderer is an online puppeteer service to render pages with javascript (js). Mainly useful for web scraping (not using splash).

At times while scraping web pages you will come across websites or web pages that only render on a browser that renders the loaded javascript. If you curl it or use something like Scrapy, you just end up with not useful HTML.

This project aims to solve that issue with Puppeteer. With Scrapy you can use Splash but it is Scrapy specific and not easy to configure.

This would a great example for fly.

@mrkurt mrkurt added the example An example project + readme label Jul 23, 2020
@mrkurt
Copy link
Member

mrkurt commented Jul 23, 2020

This sounds super cool. It's basically a service that executes JS and then returns the resulting DOM? I think with a README about why that's interesting and how it's better if you run it close to certain cities, that's a pretty great example.

I've actually wanted examples of Puppeteer for other stuff:

  1. Screenshots/thumbnails
  2. Lighthouse tests

Could be good for a second example. ;)

@geshan
Copy link
Author

geshan commented Jul 23, 2020

Hey @mrkurt , appreciate your fast reply.

It's basically a service that executes JS and then returns the resulting DOM? - Yes you are right :)

Here is the repo with steps on how to get this app running on fly.io - https://github.com/geshan/js-renderer-fly . Let me know what would be the next step(s) to get it on fly-examples. I am open to editing the Readme too.

I can do a screenshot as a service example as the next one. Thanks!

@codepope
Copy link

Hi, this is a good start - I have some notes for you -


The opener is talking to an audience that already knows what the problem is, and even what the most common tool

Suggest that it might start up something like

"JavaScript is the bane of a web scrapers life. Scraping is all about extracting data from a web page and JavaScript is there adding content, hiding blocks, moving the DOM around and just reading the HTML from the server is just not enough. What you ideally want is a way to run all that JavaScript on the page so you can see what's left after that. Then you can get down to some serious scraping.

There's tools to do this out there but most have their own compliactions or restrictions that stop them from being used out on the edge. Js-renderer-fly has none of those problems and with Fly, you can deploy to close to your users too."

(Roll in the Uses section here, with a practical example - maybe scrape Instagram data and product a list of pics)

How to deploy it on Fly - move 1 and 2 into prerequisits...

3 - only works if you are logged in the SSH support enabled

5/6 - Run flyctl init - hit return for a app name to be generated (unless there's a name you really want)

You can add flyctl init —dockerfile to skip the picking of the builer

Also, re orgs - first one on the list will be your own org now

Not sure the deploy screenshot adds much - maybe explain the stages briefly? The details aren't added to the fly.toml file, they come from the fly.toml file

11 Not sure what you are saying there.

A tour of commands might be good at this point.... status, restart, pause? Leading into the scale commands and a regions command to put an instance on every continent

A script to do something fun with IG or similar to wrap up completing the task from the start?

@geshan
Copy link
Author

geshan commented Jul 24, 2020

@codepope I have made the suggested changes here: https://github.com/geshan/js-renderer-fly/pull/6/files let me know if it is ok, thanks!

@codepope
Copy link

codepope commented Jul 24, 2020

Make a branch and merge the changes into that branch. It's difficult the review article/readme content with just patch files.

Some quick notes though. Explain what puppeteer is, or at least link to it. (see previous note on audience).

@geshan
Copy link
Author

geshan commented Jul 24, 2020

@codepope merged to master, it can be see here: https://github.com/geshan/js-renderer-fly . I will add a bit ore details about puppeteer soon. If anything else needs to be added, please let me know, thanks!

@codepope
Copy link

"Scraping is all about extracting data from a web page and JavaScript is there adding content, hiding blocks, moving the DOM around and just reading the HTML from the server is just not enough."
change to
"Scraping is all about extracting data from a web page and JavaScript is there adding content, hiding blocks and moving the DOM around. Just reading the HTML from the server is just not enough."

"There are tools to do this out there but most have their own compliacations or restrictions that stop them from being used out on the edge. Js-renderer-fly has none of those problems and with Fly, you can deploy to close to your users too.
This is an online puppeteer service to render pages with javascript (js) very useful for web scraping." - pull together to make one para... something like
"There are tools to do this out there but most have their own complications or restrictions that stop them from being used out on the edge. Js-renderer-fly has none of those problems and with Fly, you can deploy to close to your users too. At its core, js-renderer-fly is a puppeteer-based service. Puppeteer is a package which renders pages using a headless Chrome instance, executing the JavaScript within the page."

Uses section seems redundant. Maybe blend with the Quick Try....

Explain that a typical Youtube page adds the view count in JavaScript and to get that value, we're going to use js-renderer-fly to pull out that value after the JavaScript has run.

The quick-try ideally should prompt the user at that point to clone the github repo.

Will have to go over it for typos and things like "Then select and org"....

Pull the resources section into the "More Fly Commands" section so you cover lifecycle, vertical scaling and global scaling.

That's about it for now.

@geshan
Copy link
Author

geshan commented Jul 24, 2020

Hi @codepope , I have done most of the changes: https://github.com/geshan/js-renderer-fly.

I have done a quick typo and grammar fix with grammarly, thanks for the ping.

The quick-try ideally should prompt the user at that point to clone the github repo. this is the part I am not clear about. So this node script should clone this repo and try to deploy it for the user?

Let me know if more changes are required, thanks!

@codepope
Copy link

At that point, reading through, the user will not have downloaded anything. So, you'd likely want to suggest they either grab the script from the repo or clone the repo before discussing the script.

@geshan
Copy link
Author

geshan commented Jul 24, 2020

@codepope Just fixed that part too, thanks for the ping it made sense. Latest changes are here: https://github.com/geshan/js-renderer-fly . Open to suggestions.

@geshan
Copy link
Author

geshan commented Jul 27, 2020

@codepope feedback is welcome :)

@codepope
Copy link

generally youtube -> YouTube

The installer line
git clone [email protected]:geshan/js-renderer-fly.git && cd js-renderer-fly && npm install && node yt-views.js
is a bit daunting and probably better broken down into separate lines

Also the scraper app is a bit of a black box - maybe a line or two about what it does? (and a mention for the axrios library) so people could use it to kickstart actually writing a scraper.

Run Locally now repeats those instructions too...
Suggest make "Quick Try" a "Quick Start" and add a sub section for installing, and a sub section for "Your first scraping" or something. "Use it as a service" should point out that the instructions later will show you how to deploy it as a service and that you are just showing how it works with an already deployed version.

"if you are logged in the SSH support enabled else try" else->otherwise

" I tried with: " -> "I ran it with js-renderer-fly as the app name for the examples"

"Subsequently, you can select an organization. Generally, it will be your first name-last name on the prompt" Generally, it->Usually, this will...

Step 9 is confusing. I think you are trying to get two things over at once. Also it's not clear what the command line should be flyctl open /api/render?url=<your-url> I assume

You may want to give a reason why you would want to suspend the service

"So I wanted to check how much resources were allocated to this app on fly by default. It was easy to know with the following commands" -> "I wanted to see what resources were allocated to the App on Fly. The scale commands allowed me to find out"

"Now your service is running well in one data center for me it was iad which is Ashburn, Virginia (US). Now let's add some more:"

Each sentence starts with Now. "Our service is now running in one data center. For me, it's iad (Ashburn, Virginia) but yours will likely be different based on where you are working from. We can add instances around the world the speed up responses...

@geshan
Copy link
Author

geshan commented Jul 27, 2020

@codepope appreciate you taking the time for such detailed feedback, all of it has been updated: https://github.com/geshan/js-renderer-fly . Let me know if it needs any more improvements, thanks!

@geshan
Copy link
Author

geshan commented Jul 29, 2020

Just a ping for this @codepope :).

@geshan
Copy link
Author

geshan commented Jul 29, 2020

Thanks for the PRs @codepope , both have been merged. What would the next step be?

@mrkurt
Copy link
Member

mrkurt commented Jul 29, 2020

@geshan Have you been emailing with @KittyBot? This looks great, next steps are:

  1. Transfer the repository to me (mrkurt)
  2. We'll work out payment over email.

@geshan
Copy link
Author

geshan commented Jul 29, 2020

@mrkurt Yes I am in comms with @KittyBot . I have transferred the repo to you, thanks a lot!

@mrkurt
Copy link
Member

mrkurt commented Jul 29, 2020

Good deal! It's here now, I'll get it in our docs soon: https://github.com/fly-examples/puppeteer-js-renderer

@mrkurt
Copy link
Member

mrkurt commented Jul 30, 2020

Alrighty, now in the docs. I renamed the project and did a little cleanup of project names in the README text, but check it over and see if I missed anything: http://fly.io/docs/app-guides/puppeteer-js-renderer/

Feel free to submit PRs to the project for any changes you want to make.

@geshan
Copy link
Author

geshan commented Jul 30, 2020

Looks really nice. There were some references to the old project name and Github URL, I have changed them in this PR, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
example An example project + readme
Projects
None yet
Development

No branches or pull requests

3 participants