Dealing with merge conflicts

Overall workflow

Merge the next n commits from PostgreSQL into a branch
Commit the merge as is with all merge conflicts
Push to a public fork where multiple people can hack on resolving the conflicts
Fix all conflicts
Continuously rebase branch on top of Greenplum master and hack to make ICW green
When ICW is green and branch is rebased, redo the merge command from step 1 and use the diff as a template for committing a clean merge commit where all conflicts get resolved in a single merge commit
Goto 1.

Backported code

Some of the code in GPDB has already been backported from a later versions of PostgreSQL. That causes a lot of merge conflicts, but in these files, the correct resolution is usually to keep the GPDB code (i.e. the backported PostgreSQL code). But better to open corresponding upstream version and see if our backport is complete, or if we should now copy more changes from the later PostgreSQL version.

src/interfaces/libpq/ - based on PostgreSQL 9.1
src/backend/libpq/ - authentication code, based on PostgreSQL 9.0
src/backend/utils/xml.c - Based on PostgreSQL 9.1
src/backend/utils/* - some files are based on later versions.
src/bin/psql/ - Based on PostgreSQL 9.0

Merge conflict zoo

heap_form_tuple vs heap_formtuple

static void
Exec_Listen(Relation lRel, const char *relname)
{
	HeapScanDesc scan;
	HeapTuple	tuple;
	Datum		values[Natts_pg_listener];
<<<<<<< HEAD
	bool		nulls[Natts_pg_listener];
=======
	char		nulls[Natts_pg_listener];
>>>>>>> eca1388629facd9e65d2c7ce405e079ba2bc60c4
	NameData	condname;
	bool		alreadyListener = false;

		values[10] = VXIDGetDatum(proc->backendId, proc->lxid);
		if (proc->pid != 0)
			values[11] = Int32GetDatum(proc->pid);
		else
<<<<<<< HEAD
			nulls[11] = true;
		values[12] = DirectFunctionCall1(textin,
					  CStringGetDatum((char *) GetLockmodeName(LOCK_LOCKMETHOD(*lock),
													  mode)));
=======
			nulls[11] = 'n';
		values[12] = CStringGetTextDatum(GetLockmodeName(LOCK_LOCKMETHOD(*lock), mode));
>>>>>>> eca1388629facd9e65d2c7ce405e079ba2bc60c4
		values[13] = BoolGetDatum(granted);

Description

In GPDB, we have replaced calls to the old heap_formtuple() function with heap_form_tuple(). It's the same, but the format of the nulls array is different: heap_form_tuple() takes an char-array, with 'n' meaning null. and ' ' meaning not-null, while heap_formtuple() takes a boolean array.

PostgreSQL also replaced all calls to heap_formtuple() with heap_form_tuple() during the 8.4 development cycle, so these conflicts will stop happening once we reach that point.

Resolution

Keep using the modern heap_form_tuple() with boolean array.

MirroredLock stuff

	/*
	 * we should find entry, and begin scan of posting tree
	 * or just store posting list in memory
	 */

<<<<<<< HEAD
	prepareEntryScan(&btreeEntry, index, entry->entry, ginstate);
	btreeEntry.searchMode = TRUE;
	
	// -------- MirroredLock ----------
	MIRROREDLOCK_BUFMGR_LOCK;
	
=======
	prepareEntryScan(&btreeEntry, index, entry->attnum, entry->entry, ginstate);
	btreeEntry.searchMode = TRUE;
>>>>>>> eca1388629facd9e65d2c7ce405e079ba2bc60c4
	stackEntry = ginFindLeafPage(&btreeEntry, NULL);
	page = BufferGetPage(stackEntry->buffer);

Description

GPDB's file replication requires that the "MirroredLock" is kept, whenever any buffer is locked.

These will go away when we replace GPDB's file replication with upstream streaming WAL replication.

Resolution

Keep the MIRROREDLOCK_* stuff. The macros will also need to be added to any new incoming code that accesses buffers. These will be caught by assertions, once you start running tests.

#includes

#include "postgres.h"

#include "access/hash.h"
<<<<<<< HEAD
#include "utils/memutils.h"
=======
#include "access/relscan.h"
#include "utils/memutils.h"
#include "utils/rel.h"
>>>>>>> eca1388629facd9e65d2c7ce405e079ba2bc60c4
#include "utils/resowner.h"

Description

Conflicts in #includes happen whenever any #includes have been added in GPDB, and the surrounding #includes are now changed in the upstream.

Resolution

Keep both sets of #includes. Ideally, we'd remove any unnecessary #includes, but if it's not immediately clear which ones are required, it's safe to just keep them all. If you can make the file to compile, then you can remove the #includes one by one to see which ones are still required. Have a look at the corresponding upstream list, though, and don't remove ones that are present in the upstream, even if the code compiles without them, in order to keep our diff footprint small.

If there are a lot of GPDB-added #includes, you can move them all to a separate group after all the upstream #includes. If there are only a few, you can keep them in the main list. Note that in upstream, #includes are kept in alphabetical order. No technical reason for that, it's just for the sake of tidiness, but let's try to do that for GPDB-added #includes as well.

Heap Checksums

		{
			((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
			PageClearFull(page);
<<<<<<< HEAD
			MarkBufferDirtyHint(buffer, relation);
=======
			SetBufferCommitInfoNeedsSave(buffer);
>>>>>>> eca1388629facd9e65d2c7ce405e079ba2bc60c4
		}

Description

These are caused by the backport of data checksums from PostgreSQL 9.2.

Resolution

Use MarkBufferDirtyHint. Any new calls to SetBufferCommitInfoNeedsSave that are being merged in will also need to be replaced with MarkBufferDirtyHint. SetBufferCommitInfoNeedsSave doesn't exist anymore, so the compiler will remind us to do that.

Whitespace, but GPDB version looks more correct.

static Oid
lo_import_internal(text *filename, Oid lobjOid)
{
    File        fd;
    int         nbytes,
                tmp;
    char        buf[BUFSIZE];
    char        fnamebuf[MAXPGPATH];
    LargeObjectDesc *lobj;
<<<<<<< HEAD
    Oid         oid;

=======
    Oid oid;
    
>>>>>>> eca1388629facd9e65d2c7ce405e079ba2bc60c4

Description

Whitespace changes, but the old code is more correctly formatted than the new code begin merged (correctly means "as pgindent would do it").

PostgreSQL runs pgindent on the whole source tree before each major release, but not during the development cycle. This case typically happens when the change has already been backported to GPDB from the upstream, e.g. from the tip of REL8_4_STABLE, where it's been pgindented already. But when the code was first committed, it was mis-indented, and we're now trying to merge in the mis-indented version.

Resolution

Keep the correctly indented version, to avoid unnecessary code churn. Taking the incoming mis-indented version wouldn't be too bad either, as it will get fixed later when we merge the pgindent run commit, but let's avoid the unnecessary back-and-forth.

pg_dump and cdb_dump

The pg_dump binary is to a large extent duplicated into cdb_dump, a Greenplum specific dump utility. The code in src/bin/pg_dump/cdb contains large portions of the pg_dump code and must be kept in sync with API changes and new objects. While cdb_dump is being actively replaced, it still needs to be maintained until killed.

Conflicts against STABLE code (e.g. 9.4_STABLE vs. 9.5)

Some of the code from master may be from a STABLE branch which could result in merge conflicts against the next couple versions due to upstream Postgres backports into STABLE branches. To resolve these issues, it is suggested to find the STABLE commit, the current merge version's commit, and the current merge version's STABLE commit if any (e.g. find 9.4_STABLE commit, 9.5 commit, and 9.5_STABLE commit). Compare the commits and try to take the latest commit (e.g. 9.5_STABLE instead of 9.4_STABLE or 9.5). Usually when there are only two commits to compare (no current merge version STABLE commit), the code has been removed in the current merge version (e.g. 9.4_STABLE code was removed in the development cycle of 9.5) so the code should also removed in our merge iteration branch to keep in sync with upstream Postgres as much as possible.

For any questions, please ask at [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with merge conflicts

Overall workflow

Backported code

Merge conflict zoo

heap_form_tuple vs heap_formtuple

Description

Resolution

MirroredLock stuff

Description

Resolution

#includes

Description

Resolution

Heap Checksums

Description

Resolution

Whitespace, but GPDB version looks more correct.

Description

Resolution

pg_dump and cdb_dump

Conflicts against STABLE code (e.g. 9.4_STABLE vs. 9.5)

Clone this wiki locally