Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission attachments list is not updated correctly after replacing an old image/file with a new one. #1696

Closed
JohnMwashuma opened this issue Oct 2, 2019 · 14 comments · Fixed by #1698
Assignees

Comments

@JohnMwashuma
Copy link
Member

JohnMwashuma commented Oct 2, 2019

Environment (local, stage, preview, production)

All

Expected behaviour

When an image/file in a submission is replaced, the old image/file should be replaced with the new one and deleted from the attachments list.

Actual behaviour

The old image/file is replaced successfully but the old image/file is still retained in the attachments list.

Steps to reproduce the behaviour

Use Enketo to edit a submission that has an attachment. Check the attachments for that submission and you will notice the previous attachment is still available.


Implementation plan

Blocking or related onadata issues

Aha! Link: https://ona.aha.io/features/PROD-213

@WinnyTroy
Copy link
Contributor

The current implementation when editing submissions with new attachment files does not delete the old attachment, but creates a new attachment to the same submission alongside the old attachment files.

This could be helpful when keeping track of the submission history, however, as we never delete attachment files, this approach could fill up our storage unnecessarily.

@ukanga @pld @ivermac @JohnMwashuma @lincmba is there a different approach we could use, as in this instance when one edits a submission with a new media file?

@pld
Copy link
Member

pld commented Oct 2, 2019

since this data is going in s3, and it's the way we're doing it now, I'm not particularly concerned about storage space.

If we can ignore attachments that are connected to submissions that are marked as deleted, would that be sufficient to address this error?

@JohnMwashuma
Copy link
Member Author

So i found a way of filtering the attachments list from the submission api endpoint response. The response has the list of attachment fields with the attachment names. I used these names to filter the attachments.

@pld
Copy link
Member

pld commented Oct 4, 2019 via email

@JohnMwashuma
Copy link
Member Author

yes it affects attachments, all attachments are downloaded even the ones that had been replaced.

@pld
Copy link
Member

pld commented Oct 4, 2019

sounds like we need a general fix then that only surfaces attachments to instance that do have deleted_at set to null

WinnyTroy added a commit that referenced this issue Oct 9, 2019
…ill populate deleted_at field for attachments. Fixes #1696
WinnyTroy added a commit that referenced this issue Oct 14, 2019
…ill populate deleted_at field for attachments. Fixes #1696
WinnyTroy added a commit that referenced this issue Oct 28, 2019
…ill populate deleted_at field for attachments. Fixes #1696
WinnyTroy added a commit that referenced this issue Nov 12, 2019
…ill populate deleted_at field for attachments. Fixes #1696
@ukanga
Copy link
Member

ukanga commented Dec 19, 2019

We may need to rethink this again - it will be more like we should not expose in any way the attachments that have been replaced. Hence the files/attachments endpoint should not expose these records at all. I like the idea of deleted_at, but I hesitate exposing that information on the API at all since it would serve no purpose, the endpoint should use the field only to return attachments that are not replaced or deleted.

@pld
Copy link
Member

pld commented Dec 19, 2019

Yes agree, that's how we use the deleted_at column in other tables

WinnyTroy added a commit that referenced this issue Jan 6, 2020
…ill populate deleted_at field for attachments. Fixes #1696
WinnyTroy added a commit that referenced this issue Jan 13, 2020
…ill populate deleted_at field for attachments. Fixes #1696
@WinnyTroy
Copy link
Contributor

WinnyTroy commented Jan 17, 2020

QA for this issue is proving to be abit tricky.
At the moment, deleting a submission cascade deletes all the attachments for the given submission.
However, when replacing the attachments in a submission from Enketo, we do not at any point interact with the AttachmentViewSet(And also the viewset is readonly), thus we do not trigger deletion of the replaced attachment.

After discussions with @DavisRayM and @ivermac , we discussed catching this update, while saving the attachments to the instance on this file

def save_attachments(xform, instance, media_files):
"""
Saves attachments for the given instance/submission.
"""
# upload_path = os.path.join(instance.xform.user.username, 'attachments')
for f in media_files:
filename, extension = os.path.splitext(f.name)
extension = extension.replace('.', '')
content_type = u'text/xml' \
if extension == Attachment.OSM else f.content_type
if extension == Attachment.OSM and not xform.instances_with_osm:
xform.instances_with_osm = True
xform.save()
filename = os.path.basename(f.name)
media_in_submission = (
filename in instance.get_expected_media() or
[instance.xml.decode('utf-8').find(filename) != -1 if
isinstance(instance.xml, bytes) else
instance.xml.find(filename) != -1])
if media_in_submission:
Attachment.objects.get_or_create(
instance=instance,
media_file=f,
mimetype=content_type,
name=filename,
extension=extension)
update_attachment_tracking(instance)

At this point we could check the existing attachments for that instance and compare with the list of media files received from the update request and call the soft_delete action to delete the replaced attachments.
CC: @pld @ukanga

@pld
Copy link
Member

pld commented Jan 21, 2020

how would we compare with the list of media files received?

is this that if we replace an attachment for an instance, we'd only want to delete the existing attachment for that same field?

in that case is the check that we see if there's an existing attachment for the field we're replacing the attachment in, and delete that existing one if there is?

I'm trying to avoid a loop over all attachments if know specifically what attachments are relevant.

@WinnyTroy
Copy link
Contributor

is this that if we replace an attachment for an instance, we'd only want to delete the existing attachment for that same field?

We would be deleting existing attachments to an instance that have been replaced (i.e if they are no longer in the media files received with that instance).
If the attachment is still present in the media files received, then it wasnt replaced, so those we wont delete.

in that case is the check that we see if there's an existing attachment for the field we're replacing the attachment in, and delete that existing one if there is?

yes, this is what we had in mind.
We can identify new attachments for a submission as so:

attachment_names = [
a.media_file.name.split('/')[-1]
for a in Attachment.objects.filter(instance=instance)
]
media_files = [f for f in media_files
if f.name not in attachment_names]

We can then identify the attachments that have been replaced and soft_delete those.

replaced_attachments = [a for a in Attachment.objects.filter(instance=instance) 
                if a.media_file.name.split('/')[-1] not in media_files]

I'm trying to avoid a loop over all attachments if know specifically what attachments are relevant.

True. Alternatively, we could use generator expressions as opposed to list comprehensions to filter these replaced_attachments, so that we dont load all the attachments to memory. This could be less expensive imo.

@pld
Copy link
Member

pld commented Jan 23, 2020

This sounds good, let's pursue your final comment below and try to do this without a loop.

True. Alternatively, we could use generator expressions as opposed to list comprehensions to filter these replaced_attachments, so that we dont load all the attachments to memory. This could be less expensive imo.

WinnyTroy added a commit that referenced this issue Feb 6, 2020
…ill populate deleted_at field for attachments. Fixes #1696
WinnyTroy added a commit that referenced this issue Feb 6, 2020
Create soft_delete action that will populate deleted_at field for attachments. Fixes #1696
Call soft_delete action when replacing attachments
@WNjihia
Copy link

WNjihia commented Feb 10, 2020

LGTM 🙂
Tested that deleted attachments do not appear on the submissions attachments when replaced
Tested that deleted attachments do not appear on the submissions attachments when deleted and not replaced
Tested this for audio files too

@WNjihia WNjihia added the QA+ PR passed QA testing label Feb 10, 2020
@ivermac
Copy link
Contributor

ivermac commented Feb 12, 2020

@WNjihia I think you want this comment and QA+ label on the PR rather than the issue

WinnyTroy added a commit that referenced this issue Feb 19, 2020
Create soft_delete action that will populate deleted_at field for attachments. Fixes #1696
Call soft_delete action when replacing attachments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants