Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dumpgenerator.py with Python 3 #395

Open
EndeavourAccuracy opened this issue Sep 29, 2020 · 29 comments · May be fixed by #409
Open

Using dumpgenerator.py with Python 3 #395

EndeavourAccuracy opened this issue Sep 29, 2020 · 29 comments · May be fixed by #409

Comments

@EndeavourAccuracy
Copy link

Python is currently at version 3.8.6 and your code requires 2.7.

As of 23 April, Ubuntu (Focal Fossa; 20.04) repos no longer carry kitchen for Python 2; only python3-kitchen.
Similarly, as of 27 June, Mint (Ulyana; 20) can no longer access this because it relies on Ubuntu repos.

As a result, modern distros can no longer use dumpgenerator.py.

I'm not a Python programmer.
Nevertheless, I've tried converting dumpgenerator.py from Python 2 to Python 3.
This attempt was unsuccessful.

I've:

  • replaced print "" and print '' with print ("")
  • replaced ur'' with r'' (This is for Python 3. If this needs to work with both 2 and 3, we'd apparently have to use u'' and escape any backslashes in the strings.)
  • replaced cPickle with pickle, and cookielib with http.cookiejar
    But then I ran into this error, and I could not continue:
    "RecursionError: maximum recursion depth exceeded while calling a Python object"

Also, I have my own (C and PHP/JavaScript) FOSS programming projects to work on.

Can you folks work on making a version of dumpgenerator.py that works with Python 3?

@nemobis
Copy link
Member

nemobis commented Sep 29, 2020 via email

@OAHOR
Copy link

OAHOR commented Nov 8, 2020

For efficiently working with legacy versions of Python, it is recommended to use venv or (my personal preference) miniconda. Miniconda creates 'environments' which can contain any version of (for instance) python without affecting system python.

You can download miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
And install it:
bash Miniconda3-latest-Linux-x86_64.sh

After restarting your shell window or ssh session. you you can create a new conda environment 'wiki'
conda create -n wiki
Enter it by activating the environemnt:
conda activate wiki
Now you can install your special snowflake version of python:
conda install python=2.7

This python version 2.7 is only accessible from within the conda environment 'wiki' and does not affect the system files. The binaries are stored in your home directory, and none of this requires root access as you are not changing system files.

@jrbray1
Copy link

jrbray1 commented Jan 1, 2021

I was caught with this because Mint 20 does not have python2 or pip by default. It seems overkill to install them both use to get dumpgenerator.py working. Why cannot it be upgraded to a modern version?

@tiefpunkt
Copy link

@EndeavourAccuracy do you have your modifications available somewhere to work against? So we could try to fix the errors you were running into?

@EndeavourAccuracy
Copy link
Author

@EndeavourAccuracy do you have your modifications available somewhere to work against? So we could try to fix the errors you were running into?

Not any more, no. Part of my reasoning not to keep it was that I'm not a Python programmer, and I might therefore have accidentally introduced - perhaps hard to spot - errors in the code.

@elsiehupp
Copy link

I tried following the instructions in the main README on both macOS 11 and Debian 11, and in both cases it gave me the following:

$ ./dumpgenerator.py --help
Please install the kitchen module.
Please install or update the Requests module.

...even after running $ pip install --upgrade -r requirements.txt. I assume this is because Python2 is basically EOL, and it's increasingly difficult to set up a working Python2 environment.

Anyway, I'm muddling my way through @OAHOR's instructions for miniconda, and if it's the most reliable way of running dumpgenerator.py these days, then maybe it should be added to the instructions in the main README?

elsiehupp added a commit to mediawiki-client-tools/mediawiki-dump-generator that referenced this issue May 27, 2021
@EndeavourAccuracy
Copy link
Author

EndeavourAccuracy commented Jun 9, 2021

I hope this ticket won't be closed referencing a work-around to get Python 2 working on modern systems.
(Since it still won't allow using dumpgenerator.py with Python 3.)

@EndeavourAccuracy
Copy link
Author

Here's a basic manual for creating backups without shell access, and without using dumpgenerator.py.
This is for users who do have phpMyAdmin and FTP access, and want to distribute a backup without sensitive data.
Use at your own risk.

Creating a MediaWiki Backup Without Sensitive Data, Using phpMyAdmin and FTP
Version 0.1 (June 9, 2021). Public domain.

This backup method does NOT require:
- shell access, or
- a Python 2 environment (e.g. for dumpgenerator.py).

USE THIS MANUAL AT YOUR OWN RISK

--------------------
[1/2] Database
--------------------
1. Launch phpMyAdmin.
2. On the left, click your MediaWiki database.
3. On the right, click tab "Export".
4. Select export method "Custom".
5. Optionally, unselect table "archive", which contains deleted edits. (How? Click "Select All", then Ctrl+click on "archive".)
6. Verify that section "Format-specific options" ends with "structure and data" selected.
7. Verify that section "Data dump options" uses "both of the above" as insert syntax.
8. Press the "Go" button, which will download the .sql file.
9. Remove private information from the .sql file:

Note: What you actually remove is your own decision. Below are suggestions.

Search: CREATE TABLE IF NOT EXISTS `user`
Remove: everything under "Dumping data for table `user`".
(That data could reveal the user_real_name, user_email, user_password, and user_newpassword. See, for example, "SELECT CONVERT(user_email USING utf8) FROM `user`;".)

Search: CREATE TABLE IF NOT EXISTS `watchlist`
Remove: everything under "Dumping data for table `watchlist`".
(That data could reveal which pages are watched/unwatched.)

Search: CREATE TABLE IF NOT EXISTS `recentchanges`
Remove: everything under "Dumping data for table `recentchanges`".
(That data could reveal rc_ip for each change.)

10. Done.

--------------------
[2/2] File system
--------------------
1. Download all files via FTP.
2. Remove private information:

Note: What you actually remove is your own decision. Below are suggestions.

Modify or delete LocalSettings.php.

Maybe delete directory images/archive/.

Maybe delete directory images/deleted/.

Maybe delete directory images/temp/.

Maybe delete cache/.

3. Done.

@elsiehupp
Copy link

I just made it so that my existing pull request doesn’t auto-close this issue. I’m working on a Python 3 version right now.

@elsiehupp elsiehupp linked a pull request Jun 9, 2021 that will close this issue
@elsiehupp
Copy link

Can all y’all give #409 a spin? Thanks!

@EndeavourAccuracy
Copy link
Author

Can all y’all give #409 a spin? Thanks!

Personally, I've just moved to another method. I also lack the time to test-run the updated script, sorry. If this would've come just a bit earlier, I might have made different choices. I've been a bit surprised that so few users have made themselves heard here, even though this ticket has been open since September 2020. I'm guessing most MediaWiki admins have, and use, shell access for backups.

@elsiehupp
Copy link

That’s fine. I’ve since figured out how to run the CI tests locally, so I can do most of my testing myself.

@olinorwell
Copy link

olinorwell commented Jun 14, 2021

I tried your version for Python3 without luck unfortunately, trying to download the Vim Wikia I get this error.

Can all y’all give #409 a spin? Thanks!

% python dumpgenerator.py https://vim.fandom.com/wiki/Vim_Tips_Wiki --xml --images
/home/oli/programs/wikiteam/dumpgenerator.py:1142: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if buffer[-1] is not '\n':
/home/oli/programs/wikiteam/dumpgenerator.py:1524: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if xmlfiledesc is not '' and not re.search(r'', xmlfiledesc):
Checking API... https://vim.fandom.com/api.php
API is OK: https://vim.fandom.com/api.php
Checking index.php... https://vim.fandom.com/index.php
index.php is OK

Welcome to DumpGenerator 0.4.0-alpha by WikiTeam (GPL v3)

More info at: https://github.com/WikiTeam/wikiteam

Analysing https://vim.fandom.com/api.php
Traceback (most recent call last):
File "/home/oli/programs/wikiteam/dumpgenerator.py", line 2555, in
main()
File "/home/oli/programs/wikiteam/dumpgenerator.py", line 2542, in main
saveConfig(config=config, configfilename=configfilename)
File "/home/oli/programs/wikiteam/dumpgenerator.py", line 1594, in saveConfig
pickle.dump(config, outfile)
TypeError: write() argument must be str, not bytes

@elsiehupp
Copy link

Thanks. Could you pull the new changes and try it again? I didn’t finish the full dump myself, but it looks kind of like it should be working now. (As in, I think the problems I’m having at this point might be limited to the test script.)

FYI, to run, do the following:

$ pip install pipenv
$ pipenv run python dumpgenerator.py ...

Also, FYI, it’s easier to read if you wrap the output in three “tick” signs, like so:

```
[past your output here]
```

@Sylphystia
Copy link

Sylphystia commented Aug 18, 2021

FYI, to run, do the following:

$ pip install pipenv
$ pipenv run python dumpgenerator.py ...

Even after successfully installing pipenv, when I try running the script (with pipenv run ...) I get asked once again to install pipenv. Am I missing something?

It looks like a dump was started and then immediately aborted, though. It created the folder and the confix.txt file.

@elsiehupp
Copy link

It seems like the script is still relatively fussy. I was able to get this specific command to run on macOS:

% pipenv run python dumpgenerator.py https://vim.fandom.com/wiki/Vim_Tips_Wiki --xml --images

By contrast, the Wikiteam wiki wouldn’t download. For testing purposes, could you try exactly the same command with the Vim Wikia?

Even after successfully installing pipenv, when I try running the script (with pipenv run ...) I get asked once again to install pipenv. Am I missing something?

Can you try the instructions in the pipenv docs (a second time, if you’ve done so already); try running the above command; and then post exactly what the output is inside a pair of ``` (like the following) if it doesn’t work?

```
[past your output here]
```

Also, please include what system you’re running, as well as the output of the commands $ which pipenv and $ python --version.

It looks like a dump was started and then immediately aborted, though. It created the folder and the confix.txt file.

Weird. I mean, I’ve gotten failed dumps, too, but if pipenv itself is the problem, you shouldn’t be getting this far in the first place.

@elsiehupp
Copy link

Oh, and also the output of $ git status from inside the wikiteam directory.

@GreenReaper
Copy link

GreenReaper commented Sep 24, 2021

In a jail on FreeBSD 11.4-RELEASE-p9 amd64:

  • which pipenv -> /usr/local/bin/pipenv
  • python --version -> Python 3.8.12
  • git status -> On branch python3 / Your branch is up to date with 'origin/python3

At first I had issues because I thought it was necessary to install poetry and wikiteams3 using pip, then the new dumpgenerator.py by itself. That didn't work out so well. It led to the error above, which I tried adding exception logging to and I got this:

Traceback (most recent call last):
  File "/home/wikifur/dumpgenerator.py", line 37, in <module>
    import mwclient
ModuleNotFoundError: No module named 'mwclient'

This was confusing because even when I installed mwclient via pip, it didn't work. I also had to use pipenv --python 3.8 run ./dumpgenerator.py to even get that far, possibly because I had 2.7 installed at the same time.


Once I actually cloned the whole repo and checked out python3, it worked more smoothly, except that it broke when saving files:

./dumpgenerator.py --xml --xmlrevisions https://furry.wiki.opencura.com
[...namespaces downloaded...]
Titles saved at... furrywikiopencuracom_w-20210924-titles.txt
253 page titles loaded
https://furry.wiki.opencura.com/w/api.php
Getting the XML header from the API
Retrieving the XML for every page from the beginning
Traceback (most recent call last):
  File "./dumpgenerator.py", line 2839, in <module>
    main()
  File "./dumpgenerator.py", line 2830, in main
    createNewDump(config=config, other=other)
  File "./dumpgenerator.py", line 2350, in createNewDump
    generateXMLDump(config=config, titles=titles, session=other["session"])
  File "./dumpgenerator.py", line 822, in generateXMLDump
    xmlfile.write(header)
TypeError: a bytes-like object is required, not 'str'

My understanding of this is that "wb" worked when writing strings in 2.x but won't in 3.x because they're now Unicode. Instead, it has to be opened as "w" - anyway, I made the following change and it worked:

diff --git a/dumpgenerator.py b/dumpgenerator.py
index f68f190..663aad3 100755
--- a/dumpgenerator.py
+++ b/dumpgenerator.py
@@ -818,7 +818,7 @@ def generateXMLDump(config={}, titles=[], start=None, session=None):
             xmlfile = open("%s/%s" % (config["path"], xmlfilename), "a")
         else:
             print("Retrieving the XML for every page from the beginning")
-            xmlfile = open("%s/%s" % (config["path"], xmlfilename), "wb")
+            xmlfile = open("%s/%s" % (config["path"], xmlfilename), "w")
             xmlfile.write(header)
         try:
             r_timestamp = "<timestamp>([^<]+)</timestamp>"
@@ -2514,7 +2514,7 @@ def saveSpecialVersion(config={}, session=None):
         raw = r.text
         delay(config=config, session=session)
         raw = removeIP(raw=raw)
-        with open("%s/Special:Version.html" % (config["path"]), "wb") as outfile:
+        with open("%s/Special:Version.html" % (config["path"]), "w") as outfile:
             outfile.write(raw)


@@ -2529,7 +2529,7 @@ def saveIndexPHP(config={}, session=None):
         raw = r.text
         delay(config=config, session=session)
         raw = removeIP(raw=raw)
-        with open("%s/index.html" % (config["path"]), "wb") as outfile:
+        with open("%s/index.html" % (config["path"]), "w") as outfile:
             outfile.write(raw)


This appeared to fix --xml and --xml --xmlrevisions (FWIW, it's not immediately obvious that --xmlrevisions requires --xml). There may be other changes that need to be made, but I have not tested that (for example " wb" is used for images, but maybe that is correct because they are bytes?)

@elsiehupp
Copy link

Hi @GreenReaper it’s been a month or two since I last worked on this, so it may take me a little bit to catch up with what’s going on here. Thank you for the detailed information, though!

@ImmoWetzel
Copy link

ImmoWetzel commented Nov 15, 2021

any news on it ?
I also get

(.venv38) ixxx@devHost:~/workspace/wikitools3$ python dumpgenerator.py 
python: can't open file 'dumpgenerator.py': [Errno 2] No such file or directory
(.venv38) ixxx@devHost:~/workspace/wikitools3$ pip freeze
poster3==0.8.1
wikitools3==3.0.0

@elsiehupp
Copy link

Hi @ImmoWetzel—if you pop over to the pull request at #409, the instructions for how to use the (still somewhat incomplete) Python 3 port are a bit more up-to-date there. I’ve added installation instructions at the top of the thread so you don’t have to read all the way through just to use the mostly working version of dumpgenerator.

Headline to grab people’s attention as necessary:

To use wikiteam3 visit #409 and follow the instructions there.

@cooperdk
Copy link

cooperdk commented Jun 5, 2022

Can all y’all give #409 a spin? Thanks!

Personally, I've just moved to another method. I also lack the time to test-run the updated script, sorry. If this would've come just a bit earlier, I might have made different choices. I've been a bit surprised that so few users have made themselves heard here, even though this ticket has been open since September 2020. I'm guessing most MediaWiki admins have, and use, shell access for backups.

Makes very little sense, since you, in 2022, usually have shell access to a server if you have FTP or SQL access.

@nemobis nemobis mentioned this issue Jun 5, 2022
@nemobis
Copy link
Member

nemobis commented Jun 5, 2022

#433 (comment)

But #395 is two years old and as having not been fixed, it would not be illogical to renew it. The scripts should have been ported even long before that report.

@OAHOR suggests using conda to install an environment, but it makes no sense because as I wrote, Python 2.7 is no longer safe to use.

I am contemplating whether or not to help @elsiehupp if time permits.

Up to you. We also have some slightly different approaches on which one could choose to base any further work:
#331
https://github.com/nemobis/wikiteam/tree/2to3

@kwekewk
Copy link

kwekewk commented Jun 29, 2022

@OAHOR
still asking kitchen module?

`Please install the kitchen module.
Please install or update the Requests module.

(wiki) C:\Users\karti>python --version`

@elsiehupp
Copy link

@kwekewk I'm not exactly sure where you're running into problems, but I made a tidy version of the instructions for using miniconda for a pull request if you'd like to give them a try. (They're basically just @OAHOR's instructions, though.)

As an alternative, you can try the mostly functional Python 3 port I've been working on. There are other people helping me with the port, as well, so if you run into difficulties with it, you can feel free to open an Issue on that repository, and one or more of us can take a look.

@kwekewk
Copy link

kwekewk commented Jun 29, 2022

@kwekewk I'm not exactly sure where you're running into problems, but I made a tidy version of the instructions for using miniconda for a pull request if you'd like to give them a try. (They're basically just @OAHOR's instructions, though.)

As an alternative, you can try the mostly functional Python 3 port I've been working on. There are other people helping me with the port, as well, so if you run into difficulties with it, you can feel free to open an Issue on that repository, and one or more of us can take a look.

@elsiehupp solved, apparently I had to repeat the command requirements in conda pip install --user --upgrade -r requirements.txt . And, why the downloader can only download 40-50 images per minute?

@elsiehupp
Copy link

Apparently I had to repeat the command requirements in conda pip install --user --upgrade -r requirements.txt. And, why the downloader can only download 40-50 images per minute?

The delay functionality exists to help avoid getting temporarily blocked by a remote server for sending too many requests too quickly.

You should be able to specify the delay in seconds with a parameter. (You can get a list of available parameters with the --help parameter.) I vaguely remember finding that 0.5 seconds seemed to be just slow enough not to get blocked, but presumably it varies by server.

Obviously you shouldn't need the delay functionality if you're running the script locally, but if you're running the script locally you should also be able to initiate an export from within the MediaWiki admin interface itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.