Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue parse css containing absolute urls #81

Merged
merged 5 commits into from
Jun 8, 2022

Conversation

brbog
Copy link

@brbog brbog commented Jun 7, 2022

See issue #80

This is a branch that started from another pull request (see #78 ), so better merge that one first to get a better overview of the new changes.

@rzo1
Copy link
Collaborator

rzo1 commented Jun 7, 2022

Thanks for the PR. Would you mind to do an rebase?

Bram Bogaert added 3 commits June 8, 2022 08:53
# Conflicts:
#	crawler4j-core/src/test/java/edu/uci/ics/crawler4j/tests/parser/CssParseDataTest.java
@brbog
Copy link
Author

brbog commented Jun 8, 2022

Thanks for the PR. Would you mind to do an rebase?

Merged with master.

I took the liberty to also include the fix I'm using for myself as it seems to work, and it removes a bunch of code that no longer needs to be maintained. Very invasive, but since this is code that is normally used elsewhere as well and the idea with these css urls should be the same (they end up as seeds), I guess it's correct.

@rzo1 rzo1 merged commit b49ad91 into HHN:master Jun 8, 2022
@rzo1
Copy link
Collaborator

rzo1 commented Jun 8, 2022

Thanks for the PR. Would you mind to do an rebase?

Merged with master.

I took the liberty to also include the fix I'm using for myself as it seems to work, and it removes a bunch of code that no longer needs to be maintained. Very invasive, but since this is code that is normally used elsewhere as well and the idea with these css urls should be the same (they end up as seeds), I guess it's correct.

Thanks for the PR. Guess, we can do a 4.9.1 soon ;)

@rzo1 rzo1 added this to the v4.9.1 milestone Jun 8, 2022
@brbog
Copy link
Author

brbog commented Jun 8, 2022

Thanks for the PR. Would you mind to do an rebase?

Merged with master.
I took the liberty to also include the fix I'm using for myself as it seems to work, and it removes a bunch of code that no longer needs to be maintained. Very invasive, but since this is code that is normally used elsewhere as well and the idea with these css urls should be the same (they end up as seeds), I guess it's correct.

Thanks for the PR. Guess, we can do a 4.9.1 soon ;)

4.9.1 should be within reach :-).
I think I submitted my last bug, because everything is running smoothly for me now (at the moment with workarounds).

If you want I can share a Maven configuration that can generate a change report (based on some stuff you manually decide), and then that could become part of the site. That way people can follow the changes you did with the frontiers etc.

@brbog brbog deleted the issue-parse-css-containing-absolute-urls branch June 8, 2022 15:03
@rzo1
Copy link
Collaborator

rzo1 commented Jun 8, 2022

Sure ;-) - currently, it's only using the GitHub generated changelogs

@brbog
Copy link
Author

brbog commented Jun 8, 2022

Sure ;-) - currently, it's only using the GitHub generated changelogs

Hmmm, didn't think of that. GitHubs' generated changelog is more reliable because it's automatic and is quite readable (not too many commits), so probably better just update the existing documentation as needed.

Anyway, to give an idea of what one can expect from a changelog plugin of Maven, you would have to manually keep track of changes in an xml-file with a structure as follows:

<document xmlns="http://maven.apache.org/changes/1.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/changes/1.0.0 http://maven.apache.org/xsd/changes-1.0.0.xsd">
	
	<properties>
		<title>Changes ${project.artifactId} Project</title>
	</properties>
	<body>
		<release version="1.1" date="2022-11-01" description="Subsequent release">
			<action type="add">
				New feature X.
			</action>
			<action type="fix">
				Fix "Some bug...".
			</action>
			<action type="remove" due-to="Some customer">
				Deleted the erroneous code.
			</action>
			<action type="update">
				Updated documentation.
			</action>
		</release>

		<release version="1.0" date="2022-09-01" description="First release">
			<action type="add">
				Initial features...
			</action>
		</release>
	</body>
</document>

Most stuff is optional and the type can be anything, but only the 4 in this example end up with a custom icon next to them.

Getting rid of Groovy is probably a better long term investment :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants