-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reddit feed update fails on r/freeebooks #1104
Comments
What version of selfoss are you using? There have been some improvements to reddit spot on master, do you think you could try that? https://bintray.com/fossar/selfoss/selfoss-git/2.19-0ea67f9#files |
It works for me on master. One thing that I noticed that some posts have |
I tried with 2.18, and a snapshot, 2.19-0ea67f9. I was able to replicate the issue in both cases. I am going to try with the current master branch and report back with the results. |
I have just tried adding the reddit source again, using the latest Master branch. Unfortunately, the problem still persists, though now with a different Kindle book list. |
Your log does not really tell us much. The last three lines are logged whenever a web request is made against selfoss. I would expect to see some log lines related to icons or thumbnails before “item inserted” is printed.
|
The log snippet I posted above happens when fetching from this particular reddit source. After the particular item is sanitized, the item is not inserted, the entire updating process does not continue, i.e. no insertion of later items or database optimisation or cleanup happens, as it does with an update without the reddit source. On the main Selfoss page, I get a message stating that Selfoss was unable to refresh sources: timeout. I have turned on DEBUG on line 17 of common.php, unfortunately I do not have anything extra added in the logs. |
Fixed the thumbnail-less warnings in 3967955. I can reproduce the refresh timeout in source refresh on master but the update still works in the background:
(this with |
The timeout error does not happen with the latest changes. My log entries are unfortunately the same, with the DEBUG log level and DEBUG set to 1.
|
That is weird, I would expect at least a message about a HTTP request. Do you see those for previous items? Could you try the following patch for more granular logging? --- a/helpers/ContentLoader.php
+++ b/helpers/ContentLoader.php
@@ -185,6 +185,7 @@
return;
}
+ \F3::get('logger')->debug('cl:newitem');
$newItem = [
'title' => $title,
'content' => $content,
@@ -198,12 +199,15 @@
];
// save thumbnail
+ \F3::get('logger')->debug('cl:fetchthumb');
$newItem = $this->fetchThumbnail($item->getThumbnail(), $newItem);
// save icon
+ \F3::get('logger')->debug('cl:fetchicon');
$newItem = $this->fetchIcon($item->getIcon(), $newItem, $lasticon);
// insert new item
+ \F3::get('logger')->debug('cl:add');
$this->itemsDao->add($newItem);
\F3::get('logger')->debug('item inserted');
--- a/spouts/reddit/reddit2.php
+++ b/spouts/reddit/reddit2.php
@@ -253,9 +253,11 @@
public function getIcon() {
$imageHelper = $this->getImageHelper();
$htmlUrl = $this->getHtmlUrl();
+ \F3::get('logger')->debug('reddit:getIcon:start');
if ($htmlUrl && $imageHelper->fetchFavicon($htmlUrl)) {
$this->faviconUrl = $imageHelper->getFaviconUrl();
}
+ \F3::get('logger')->debug('reddit:getIcon:end');
return $this->faviconUrl;
}
|
Thank you. Here is my new log:
|
I would expect at least a message about a HTTP request. Do you see those for previous items? |
Here is even more detailed logging: --- a/helpers/ContentLoader.php
+++ b/helpers/ContentLoader.php
@@ -185,6 +185,7 @@
return;
}
+ \F3::get('logger')->debug('cl:newitem');
$newItem = [
'title' => $title,
'content' => $content,
@@ -198,12 +199,15 @@
];
// save thumbnail
+ \F3::get('logger')->debug('cl:fetchthumb');
$newItem = $this->fetchThumbnail($item->getThumbnail(), $newItem);
// save icon
+ \F3::get('logger')->debug('cl:fetchicon');
$newItem = $this->fetchIcon($item->getIcon(), $newItem, $lasticon);
// insert new item
+ \F3::get('logger')->debug('cl:add');
$this->itemsDao->add($newItem);
\F3::get('logger')->debug('item inserted');
--- a/helpers/Image.php
+++ b/helpers/Image.php
@@ -42,6 +42,7 @@
*/
public function fetchFavicon($url, $isHtmlUrl = false, $width = null, $height = null) {
// try given url
+ \F3::get('logger')->debug('img:favicon:try given url');
if ($isHtmlUrl === false) {
$faviconAsPng = $this->loadImage($url, $width, $height);
if ($faviconAsPng !== null) {
@@ -54,6 +55,7 @@
$urlElements = parse_url($url);
// search on base page for <link rel="shortcut icon" url...
+ \F3::get('logger')->debug('img:favicon:search on base page for <link rel="shortcut icon" url...');
$html = null;
try {
$html = \helpers\WebClient::request($url);
@@ -61,7 +63,12 @@
\F3::get('logger')->debug('icon: failed to get html page: ', ['exception' => $e]);
}
+ // parse
+ \F3::get('logger')->debug('img:favicon:parse');
$shortcutIcon = $this->parseShortcutIcon($html);
+
+ // try to load the discovered icon
+ \F3::get('logger')->debug('img:favicon:try to load the discovered icon');
if ($shortcutIcon !== null) {
$shortcutIcon = (string) UriResolver::resolve(new Uri($url), new Uri($shortcutIcon));
@@ -74,6 +81,7 @@
}
// search domain/favicon.ico
+ \F3::get('logger')->debug('img:favicon:search domain/favicon.ico');
if (isset($urlElements['scheme']) && isset($urlElements['host'])) {
$url = $urlElements['scheme'] . '://' . $urlElements['host'] . '/favicon.ico';
$faviconAsPng = $this->loadImage($url, $width, $height);
@@ -99,6 +107,7 @@
*/
public function loadImage($url, $extension = 'png', $width = null, $height = null) {
// load image
+ \F3::get('logger')->debug('img:loadimage:load image');
try {
$data = \helpers\WebClient::request($url);
} catch (\Exception $e) {
@@ -108,6 +117,7 @@
}
// get image type
+ \F3::get('logger')->debug('img:loadimage:get image type');
$imgInfo = @getimagesizefromstring($data);
if (in_array(strtolower($imgInfo['mime']), self::$faviconMimeTypes, true)) {
$type = 'ico';
@@ -124,6 +134,7 @@
}
// convert ico to png
+ \F3::get('logger')->debug('img:loadimage:convert ico to png');
if ($type === 'ico') {
$loader = new IcoFileService();
try {
@@ -156,6 +167,7 @@
}
// parse image for saving it later
+ \F3::get('logger')->debug('img:loadimage:parse image for saving it later');
try {
$wideImage = WideImage::load($data);
} catch (\Exception $e) {
@@ -163,6 +175,7 @@
}
// resize
+ \F3::get('logger')->debug('img:loadimage:resize');
if ($width !== null && $height !== null) {
if (($height !== null && $wideImage->getHeight() > $height) ||
($width !== null && $wideImage->getWidth() > $width)) {
@@ -171,6 +184,7 @@
}
// return image as jpg or png
+ \F3::get('logger')->debug('img:loadimage:return image as jpg or png');
if ($extension === 'jpg') {
$data = $wideImage->asString('jpg', 75);
} else {
--- a/spouts/reddit/reddit2.php
+++ b/spouts/reddit/reddit2.php
@@ -253,9 +253,11 @@
public function getIcon() {
$imageHelper = $this->getImageHelper();
$htmlUrl = $this->getHtmlUrl();
+ \F3::get('logger')->debug('reddit:getIcon:start');
if ($htmlUrl && $imageHelper->fetchFavicon($htmlUrl)) {
$this->faviconUrl = $imageHelper->getFaviconUrl();
}
+ \F3::get('logger')->debug('reddit:getIcon:end');
return $this->faviconUrl;
} |
Yes, I receive HTTP request messages about previous entries. The log now contains:
|
Looks like it cannot load the amazon page. I will try to write in-Guzzle logging for even more details. For now, you could you try to play with --- a/helpers/WebClient.php
+++ b/helpers/WebClient.php
@@ -42,6 +42,7 @@
'User-Agent' => self::getUserAgent(),
],
'handler' => $stack,
+ 'timeout' => 60, // in seconds
]);
self::$httpClient = $httpClient; |
Thank you very much. It seems like this has solved the issue. I am getting errors as some of the images returned are null, but this is to be expected, as I think a lot of entries use Amazon, especially in this reddit feed. Would it be possible to have a configuration option that does not retrieve and store favicons and thumbnails at all? I noticed that we have a show_thumbnails option available in the Selfoss configuration file, however, I think images are still stored if it is set to 0. Please let me know if I should raise an issue regarding this. |
Yeah, I think it should be possible to opt to use reddit favicon instead of site icon or the image. I would actually prefer it but some people might not. I am thinking about making it a selfoss extension or something. Using timeouts by default is definitely a good idea, otherwise one bad feed can block everything indefinitely as we saw here. We had them previously in some code paths but during the Guzzle port, I missed them. We will need to come up with some values that will work the majority of time. Maybe we could even do something like iterative deepening. We should also try to fetch multiple feeds in parallel to speed everything up but that will require wider redesign of internals, one that I have planned bud did not yet have time to carry out. Last but not least, the favicon routine is not very efficient either (it downloads the same page up to three times). We should add some caching. |
Let's make this issue about the timeouts. I can create issues for the other improvements. |
I subscribed to r/freeebooks via the Reddit spout. Currently, the update fails when cliupdate.php is executed via Cron or manually.
The relevant debug log:
The last three lines keep repeating in the log, even after hours of script execution.
The issue also happens with a brand new database, after importing an OPML feed.
I am running PHP 7.3.5, on Arch Linux ARM, on a Raspberry Pi 3, using sqlite as the database.
The page is reachable at http://ogres-crypt.com/Kindle/Free-Hobbies-Books.html
The text was updated successfully, but these errors were encountered: