Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra body tag added by crawler #164

Closed
ajgarlag opened this issue Jan 19, 2023 · 1 comment · Fixed by #165
Closed

Extra body tag added by crawler #164

ajgarlag opened this issue Jan 19, 2023 · 1 comment · Fixed by #165

Comments

@ajgarlag
Copy link

After markdown content is rendered into an html string, when is processed by HtmlIdProcessor, a body tag is added.

You can see the problem reviewing the source of any Stenope documentation web page: they have a body tag inside the main tag:

<!-- https://stenopephp.github.io/Stenope/ structure -->
<html>
    <head>
        <!-- ... -->
    </head>
    <body>
        <div class="main">
            <div class="columns">
                <aside class="sidebar">
                    <!-- ... -->
                </aside>
                <main class="content">
                    <body>
                        <h1 id="stenope">Stenope<a href="#stenope" class="anchor"></a></h1>
                        <!-- ... -->
                    </body>
                </main>
            </div>
        </div>
        <!-- ... -->
    </body>
</html>

I've avoided this extra body tag by modifying the NaiveHtmlCrawlerManager::save method:

// src/Service/NaiveHtmlCrawlerManager.php
    public function save(Content $content, array &$data, string $property): void
    {
        $key = "{$content->getType()}:{$content->getSlug()}";

        if (isset($this->crawlers[$key][$property])) {
            // $data[$property] = $this->crawlers[$key][$property]->html(); // Old code
            $data[$property] = $this->crawlers[$key][$property]->children()->first()->html(); // New code
            unset($this->crawlers[$key][$property]);
        }
    }

I've just started to evaluate this project and I'm not sure if this is the correct fix or if it is complete. Should I open a PR with this change?

@ajgarlag ajgarlag changed the title Body tag added by crawler Extra body tag added by crawler Jan 19, 2023
@ogizanagi
Copy link
Member

You're right, I spotted this issue a while ago (though, I thought I fixed it 🤔).
I don't know yet if this is the right fix, but thank you very much for the report 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants