Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sustainability 2022 Queries #2989

Merged
merged 71 commits into from
Aug 19, 2022
Merged

Sustainability 2022 Queries #2989

merged 71 commits into from
Aug 19, 2022

Conversation

camcash17
Copy link
Contributor

@camcash17 camcash17 commented Jun 23, 2022

Progress on #2910

Contents of PR are duplicated from the Google doc outline

Hosting

  • % of “green hosted” sites
  • CDN usage

General

  • Co2e per page load
    (Page weight)
  • Request distribution
  • Requests by type

Cache

  • Cache adoption
    - [ ] Caching by resource type

Image Optimization

  • Lazy loading
    - [ ] Native lazy loading v. JS implementation
  • Adoption of formats
    - [ ] Image quality
  • Image size

JS & CSS

  • Compression
  • Minification
  • Unused Code
  • Inline v. external

Fonts

  • Requests per page
  • Format adoption
    - [ ] Unused font requests

Video

  • Preload
  • Autoplay

Third Parties

  • Green hosting
    - [ ] Co2e from third parties
    - [ ] Co2e by third-party category

Platform Summary

  • CMS
  • eCommerce
  • Jamstack

@camcash17 camcash17 added the analysis Querying the dataset label Jun 23, 2022
@camcash17 camcash17 added this to the 2022 Analysis milestone Jun 23, 2022
@camcash17 camcash17 marked this pull request as draft June 23, 2022 03:51
@camcash17 camcash17 linked an issue Jun 23, 2022 that may be closed by this pull request
6 tasks
@tunetheweb tunetheweb mentioned this pull request Jun 24, 2022
6 tasks
@fershad
Copy link
Contributor

fershad commented Jul 2, 2022

@tunetheweb could we find some time later this week to look at 42714b0#diff-e62b9f849c03e2bbbcb42e98e4ecdfc786dea8b2ce842a87892bb286604dacd5

The query we've got currently gives a total percentage, but I'd also like to break it down by % of top 1000, 10000, 100000 sites.

@tunetheweb
Copy link
Member

Details here: https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide#rank

So this works:

#standardSQL
# What percentage of URLs are hosted on a known green web hosting provider?

WITH green AS (
  SELECT
    NET.HOST(url) AS host,
    TRUE AS is_green
  FROM
    `httparchive.almanac.green_web_foundation`
  WHERE
    date = '2022-06-01'
),

pages AS (
  SELECT
    _TABLE_SUFFIX AS client,
    NET.HOST(url) AS host,
    rank
  FROM
    `httparchive.summary_pages.2022_06_01_*`
)

SELECT
  client,
  rank_grouping,
  COUNTIF(is_green) AS total_green,
  COUNT(0) AS total_sites,
  COUNTIF(is_green) / COUNT(0) AS pct_green
FROM
  pages
LEFT JOIN
  green
USING
  (host),
UNNEST([1000, 10000, 100000, 1000000, 10000000]) AS rank_grouping
WHERE
  rank <= rank_grouping
GROUP BY
  client,
  rank_grouping
ORDER BY
  client,
  rank_grouping

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far!

A few comments:

.talismanrc Outdated Show resolved Hide resolved
sql/2022/sustainability/cdn_adoption.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/green_web_hosting.sql Show resolved Hide resolved
sql/2022/sustainability/green_web_hosting.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/green_web_hosting.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/page_bytes_per_type.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/requests_by_type.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/page_bytes_per_type.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/cdn_adoption.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/cms_bytes_per_type.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/ecommerce_bytes_per_type.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/ecommerce_bytes_per_type.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/stylesheet_count.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/text_compression.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/text_compression.sql Show resolved Hide resolved
sql/2022/sustainability/unminified_css_bytes.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/unminified_js_bytes.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/stylesheet_count.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/stylesheet_count.sql Outdated Show resolved Hide resolved
@tunetheweb
Copy link
Member

I've still some open comments on this PR, but think most of the queries are in a reasonably fit state. I would suggest starting to run them, and save the data to the sheet, so you can see what the data looks like, while also addressing the comments I've made. I'm a little uncertain with exactly what we hope to get our of some of the queries, but maybe once we see the data it will make more sense (or you'll all see the query perhaps doesn't make as much sense as you thought).

We'll still need this PR reviewed and merged, but as long as most of the queries are OK, it might just need a few rerunning once the review has identified corrections. You also may find some need slight tweaks as you run them.

@fershad
Copy link
Contributor

fershad commented Aug 17, 2022

I've nuked the Green Third Parties query since it was returning some pretty strange results. I've rewritten it (f3dffe7) to produce results that look more right and give us something to talk to.

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is almost there as far as I can see.

Can we look at the open comments and then get this merged? We can open new PRs for any new queries we want to add after.

sql/2022/sustainability/image_formats.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/stylesheet_count.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/unminified_css_bytes.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/unused_css_bytes_distribution.sql Outdated Show resolved Hide resolved
sql/2022/sustainability/unused_js_bytes_distribution.sql Outdated Show resolved Hide resolved
@fershad
Copy link
Contributor

fershad commented Aug 19, 2022

@tunetheweb I've updated the checklist at the top. We've got one query on Font format adoption that has been missed, but I'm guessing the Fonts chapter has data on this that we can use.

@tunetheweb tunetheweb merged commit 6812c68 into main Aug 19, 2022
@tunetheweb tunetheweb deleted the sustainability-2022-queries branch August 19, 2022 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sustainability 2022
6 participants