phpbb-salvage

Partially recreate a phpBB database from public HTML pages. Board, topic and post links will be retained by keeping their original ids, but private info such as e-mail addresses, passwords and internal messages will not be restored. User accounts must be manually reset by an admin for continued use.

Due to the many customization options, a plethora of styles and long version history of phpBB, a lot of manual adjustments to phpbb-salvage must be performed to tailor the parsing for the forum you are targeting. The code in this repository was made to restore forum.mm2c.com to phpBB 3.3.8 from a HTML scrape of its phpBB 2.0.11 with a custom style based on "subBlue", custom profile fields, ranks, BBCode and smileys.

Build

dotnet build

Usage

phpbb-salvage [-v] [-q] [-s outputtype outputfile] [-l inputtype inputfile] [-f sourcetype sourcefile] [datadir]

Supply one input type: datadir, -f type file or -l type file

  -s  Save parsed data in given format of type
      Terminal (default, no file), SQL, JSON or Bin
  -l  Load previously parsed data in given format of type
      JSON or Bin
  -f  Parse single file input for given source of type
      Index, Forum, Topic, Memberlist or Profile
  -v  Increase verbosity
  -q  Quiet

Manual

0. Gather source data

Download forum webpages from a running server or the Wayback Machine. From a live server you can get all topic pages and profiles by downloading sequentially by known id ranges. From the Wayback Machine you can get URLs from the CDX API. Be wary of the many unnecessary pages produced by an undirected crawl.

# Board index
> echo 'http://myforum/index.php' > urls

# User profiles
> printf 'http://myforum/profile.php?mode=viewprofile&u=%d\n' {2..100} >> urls

# Topics, first page
> printf 'http://myforum/viewtopic.php?t=%d\n' {1..1000} >> urls

# Download
> wget --append-output=log --limit-rate=100k --wait=5 --force-directories --timestamping --input-file=urls

# Topics, next page
> page=2; perpage=15; prevstart=$((perpage*(page-2))); start=$((perpage*(page-1))); rg --files-with-matches "start=$start\">Next</a></b></span></td>\$" | cut -d '&' -f 1 | sort -g -t '=' -k2 | sed -e "s/^/http:\/\/myforum\/
/;s/\$/\&start=$start/"

# Then download and repeat until there are no more topics with more pages left

1. Parse

Use phpbb-salvage to parse the datadir. You'd want to begin with individual files to test the parser and print the result in the terminal:

phpbb-salvage -v -f Topic 'datadir\viewtopic.php?t=1'

When all data can be parsed you can store it in a binary or JSON cache file for fast loading when tweaking the output format:

phpbb-salvage -s Bin data.bin datadir

Generate SQL from cached parse result:

phpbb-salvage -s data.sql -l Bin data.bin

2. Insert in clean phpBB database

Run the generated SQL file in your favourite SQL client. phpbb-salvage is written for PostgreSQL, minor tweaks to the output are needed if using other databases.

Then run phpBB's reparser tool on the server to apply the internal formatting to text with BBCode:

php bin/phpbbcli.php reparser:reparse --ansi

3. Reset users

The admin user must set new e-mail addresses for accounts manually in order to initiate password reset.

TODO

Suggested improvements for aspiring phpbb-salvage contributors:

Automate crawling of topic pages
Improve parser robustness
Handle SQL inserts of long posts without truncation
Import avatar images from files
Populate dummy poll votes
Board categories
Board sort order
User tracking timestamps (new posts since last activity)

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
src		src
.gitignore		.gitignore
README.md		README.md
phpbb-salvage.fsproj		phpbb-salvage.fsproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

phpbb-salvage

Build

Usage

Manual

0. Gather source data

1. Parse

2. Insert in clean phpBB database

3. Reset users

TODO

About

Releases

Packages

Languages

dstien/phpbb-salvage

Folders and files

Latest commit

History

Repository files navigation

phpbb-salvage

Build

Usage

Manual

0. Gather source data

1. Parse

2. Insert in clean phpBB database

3. Reset users

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages