Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nitter RSS: Handle over-processed links and metadata #131

Open
nemobis opened this issue Jun 25, 2023 · 7 comments
Open

Nitter RSS: Handle over-processed links and metadata #131

nemobis opened this issue Jun 25, 2023 · 7 comments

Comments

@nemobis
Copy link
Contributor

nemobis commented Jun 25, 2023

Using the RSS import option with Nitter works quite well, but the resulting posts are hard to read because every hashtag

Also, the nitter_base_url isn't applied because the links to the original post go to the URL provided by the RSS feed rather than to the original.

An example RSS feed from an instance running 2023.05.30-38985af is:

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <atom:link href="https://nitter.lacontrevoie.fr/CopernicusEU/rss" rel="self" type="application/rss+xml" />
    <title>Copernicus EU / @CopernicusEU</title>
    <link>https://nitter.lacontrevoie.fr/CopernicusEU</link>
    <description>Twitter feed for: @CopernicusEU. Generated by nitter.lacontrevoie.fr
</description>
    <language>en-us</language>
    <ttl>40</ttl>
    <image>
      <title>Copernicus EU / @CopernicusEU</title>
      <link>https://nitter.lacontrevoie.fr/CopernicusEU</link>
      <url>https://nitter.lacontrevoie.fr/pic/pbs.twimg.com%2Fprofile_images%2F1629950827925315587%2F3gnlK62Y_400x400.jpg</url>
      <width>128</width>
      <height>128</height>
    </image>
    <item>
      <title>#Copernicus for #wildfire monitoring

Our #OpenData provides high-resolution imagery to monitor fires🔥 around the world, especially in sensitive &amp; protected ecosystems

On 23 June, our #Sentinel2🇪🇺🛰️ satellite captured this fire in Mexico&apos;s Pantanos de Centla Biosphere Reserve🇲🇽</title>
      <dc:creator>@CopernicusEU</dc:creator>
      <description><![CDATA[<p><a href="https://nitter.lacontrevoie.fr/search?q=%23Copernicus">#Copernicus</a> for <a href="https://nitter.lacontrevoie.fr/search?q=%23wildfire">#wildfire</a> monitoring<br>
<br>
Our <a href="https://nitter.lacontrevoie.fr/search?q=%23OpenData">#OpenData</a> provides high-resolution imagery to monitor fires🔥 around the world, especially in sensitive &amp; protected ecosystems<br>
<br>
On 23 June, our <a href="https://nitter.lacontrevoie.fr/search?q=%23Sentinel2">#Sentinel2</a>🇪🇺🛰️ satellite captured this fire in Mexico's Pantanos de Centla Biosphere Reserve🇲🇽</p>
<img src="https://nitter.lacontrevoie.fr/pic/media%2FFzdBZyGWYAIYkDm.jpg" style="max-width:250px;" />]]></description>
      <pubDate>Sun, 25 Jun 2023 08:01:39 GMT</pubDate>
      <guid>https://nitter.lacontrevoie.fr/CopernicusEU/status/1672877701554659329#m</guid>
      <link>https://nitter.lacontrevoie.fr/CopernicusEU/status/1672877701554659329#m</link>
    </item>  </channel>
</rss>

@nemobis
Copy link
Contributor Author

nemobis commented Jun 25, 2023

Example output: https://respublicae.eu/@EURLex/110603986431571007

[nitter.lacontrevoie.fr/search?q=%23OnThisDay](https://nitter.lacontrevoie.fr/search?q=%23OnThisDay)
in 1990, signature of the
https://nitter.lacontrevoie.fr/search?q=%23Schengen
Convention between 🇫🇷,🇧🇪,🇩🇪, 🇱🇺 &amp;  🇳🇱
Together with the agreement of 1985 &amp; accession agreements, it forms the
https://nitter.lacontrevoie.fr/search?q=%23SchengenAcquis
allowing over 400 million people to travel freely without border controls  ➡️
https://europa.eu/!whNGXQ

(Also shows an &amp;.)

@nemobis
Copy link
Contributor Author

nemobis commented Jun 25, 2023

And for usernames, https://respublicae.eu/@EURLex/110603985985909102 :

.
https://nitter.lacontrevoie.fr/EUCouncil
has adopted a resolution on
https://nitter.lacontrevoie.fr/search?q=%23customs
cooperation in the area of
https://nitter.lacontrevoie.fr/search?q=%23lawenforcement
and its contribution to the
https://nitter.lacontrevoie.fr/search?q=%23internalsecurity
of the EU
👉
https://europa.eu/!4KHvWh

@nemobis
Copy link
Contributor Author

nemobis commented Jun 25, 2023

Also I'm not sure it's useful to prefix "RT by <current_user>" rather than just "RT", as in https://respublicae.eu/@EURLex/110603985605674224:

RT by [@EURLex](https://respublicae.eu/@EURLex): .[@EUinNL](https://respublicae.eu/@EUinNL) [#tenders](https://respublicae.eu/tags/tenders) - Netherlands-The Hague: Security Guard and Reception/Switchboard Services for the Premises in the Netherlands - 17/07/2023 - https://europa.eu/!xcTcYg

@nemobis
Copy link
Contributor Author

nemobis commented Jun 25, 2023

Some of the issues were already reported at #122 and fixed in the test version, will need to check again.

@nemobis
Copy link
Contributor Author

nemobis commented Oct 19, 2023

The current state can be seen at https://respublicae.eu/@EURLex (using 1.1.1rc58).

@nemobis
Copy link
Contributor Author

nemobis commented Nov 20, 2023

The author field can be missing too

                                                                                                                      
The above exception was the direct cause of the following exception:                                                                                                                                                                                                                                                                                              
Traceback (most recent call last):
  File "/home/7/federico/mastodon/bot/lib/python3.9/site-packages/pleroma_bot/cli.py", line 623, in main
    tweets_rss = user.parse_rss_feed(
  File "/home/7/federico/mastodon/bot/lib/python3.9/site-packages/pleroma_bot/_utils.py", line 850, in parse_rss_feed
    for idx, res in enumerate(                                                                                        
  File "/usr/lib/python3.9/multiprocessing/pool.py", line 870, in next                                               
    raise value                
AttributeError: object has no attribute 'author'   

@nemobis
Copy link
Contributor Author

nemobis commented Nov 20, 2023

Should probably also drop the HTML markup in posts like https://kolektiva.social/@pyorapajahelsinki/111443408564161235 (but this is from a Telegram feed: https://rsshub.app/telegram/channel/pyorapaja ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant