-
Notifications
You must be signed in to change notification settings - Fork 26
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receiving back EBMS:0009 when sending via Peppol AS4 to oxalis (potentially just new version). #236
Comments
Hi! |
hi @artjomsk , Thank you for finding time to comment on this. Well, I thought so for #200 also, but I am not sure if in some ways Java configurations could come into play where communication can still happen between APs running oxalis (and maybe phase4), but there is trouble in communication with us, since it's not Java based. Checking the two APs we get trouble with I got:
and
So there goes my theory about this issue having something to do only with the latest oxalis release... |
Did you succeed with clarifying the reason of the issue? It could be a wrong certificate of receiver used during sending (e.g. cached SMP lookup results or updated certificate at AP but not updated in SMP). |
Hi @dladlk, no clarity on this issue yet. Well, if this is about the signing and signature verification, then there is no lookup happening. As the sending party uses its Peppol certificate for signing and embeds the certificate in XML to be used for verification. The receiver only verifies that XML signature is in tact and the certificate used for signing is a valid (non revoked) Peppol certificate. It must be something about the different XML signature verification algorithm used or something along those lines.. Are you experiencing a similar issue? |
No, we do not experience it - just curious and try to help. |
Are you able to reproduce error?
So you are saying that things were working fine until receiving AP was using older Oxalis version (which version?). Is it possible for you to check with receiving AP as what else changed (change in java version, enabling of TLSv1.3 or addition/overriding by some conflicting library, etc ...) at their end when receiving AP upgraded from previous version of Oxalis to 6.4.0? |
Unfortunately your theory do not justifying whatever examples you gave . You mentioned "Checking the two APs we get trouble with I got:" , where you mentioned "oxalis: 4.1.1" which was released on "06.02.2020" & running on "1.8.0_382" and "oxalis: 6.4.0" which was released on "09.12.2023" & running on "17.0.2". It is also to be noted that no other AP using Oxalis 6.4.0 reported this issue. So I recommend, please jointly investigate the issue and find out issue or share with us more details possibly complete stacktrace to reproduce this issue. If during your investigation, you found any issue with interoperability of two libraries then please report back with details. |
Hi @dladlk and @aaron-kumar , First of all, thank you both for stopping by to comment on this. Let me reply to everything one by one.
I've checked internally and it was decided that it makes no sense for us to mention any details of the other parties without their permission. We've also notified both service providers (SPs) that we opened this GitHub issue, so they are also free to join the conversation if they feel like it. We've been in contact with the SP that recently upgraded to 6.4.0 and we have the issue with to investigate the issue further. I've shared the certificate details that is used for encrypting the message (based on the data from the SMP lookup) and shared the dump of the signed & encrypted message we sent them. We also double checked all the signing & encryption algorithms that are used on our side and they are 1 for 1 as in the specification, but we also provided these details. Last thing we heard from them is that they will try to reproduce the issue and come back to us. We are also open to reproduce the issue in the test environment with them and we communicated it. During these conversations they also shared the full error log they are able to see. Previously I shared the error log that was sent back in the AS4 error response:
|
By stacktrace I see that the error happens during decryption of payload in attachment to calculate its digest to compare it with the one included in signature:
And it happens when input stream to read is finished (attempt to read more data returns -1) - so Cipher tries to finalize decryption:
I would suggest for receiver to check the size of the transferred payload (e.g. via ingress logs of POST size), maybe it exceeds some limits on their ingress, or check CXF temporary folder where "big" payloads (more than 127KB of zipped/encrypted data) are cached to file system - something wrong can be with the file. Also you can try to see if small payloads are sent successfully but big failing etc... |
So far we consistently have this issue with these two parties in Production and none of them reacted to try to reproduce it in the test environment, but we are open to it.
So, I've looked up the conversations with this AP dating from Nov 2023 when they still ran Oxalis 4.1.0 and Java 8. We were getting a 500 response back without any proper AS4 error message. When we reached out, the finding on their side was some warnings (not errors) logged with javax.crypto.AEADBadTagException: Tag mismatch. This was never fixed and they were planning to upgrade to a more recent Oxalis release that supports reporting in 2024, so this was put on hold. After the system upgrade on their side, the issue has changed to the one that I reported. I should also mention, that as a temporary workaround for the end of 2023, we could use an old oxalis release on our side (4.1.1) and it was able to send documents to this party. We've compared everything in the soap envelope (both Messaging and Security) between the outgoing messages from the 4.1.1 oxalis an our current solution and could not find any meaningful difference. It was also configured to use the same Peppol certificate and use the same SML ofc. We also have not had such issues with other APs before or after (except one more party that was discovered later and I mention in this GitHub issue) with our current solution.
Indeed, once I could lookup the oxalis version of the other AP, who turned out to be using 4.1.1, it was clear that it is not something specific for a latest oxalis release. I suppose it has something to do with configuration or network setup one way or another as it is not consistent with just Oxalis / Java version by the looks of it. I will keep you updated if we have any findings |
Hmm, I will communicate this to the other party, thanks for a possible pointer or at least something to investigate. However, I've just checked and the last invoice that was used to reproduce the issue was like 71KB with SBDH and the full AS4 message that was actually sent was around 50KB after encryption and compression. That doesn't sound that large. Also not so long ago we have again verified our latest version against the Peppol Testbed v2 and we passed all the test, including the large file test scenario which is almost 10MB for the attachment :) So I don't expect any problems there from our side. |
Then the size is not the case, I agree :) The only thing we can conclude - decryption fails after reading the full payload. |
Hello, Our infrastructure:
This is all hidden behind the AWS Api Gateway (it listens on our domain and provides the HTTPS connection), then requests are forwarded to the AWS ELB listening on the HTTP port. This allows us to have the Oxalis AS4 plugin accessible directly on our domain without having to add
We already tried the Java Temurin distribution instead of Corretto, with no result. This issue is limiting for us, as we are unable to get a production Peppol certificate, let alone operate as an Access Point for our customers. @aaron-kumar, Would it be possible on your part to investigate this issue earlier than milestone 7.0.0? Thank you in advance for your feedback. |
I've been planning to give an update for a while, but here it is finally. TLDR: for us the issue is still there when sending to one other Peppol access point Following the discussions here in February/March, we've concluded an internal analysis of this issue on our Access Point (AP). We've found that indeed there is one other AP that we consistently have trouble sending documents to and we get an error from them. We've reached out to them to try and look into it. Also, there were two other APs that we have had a few documents sent with similar errors, but this issue has been resolved after some time. Meanwhile, there were no updates on our side, so it must've been something changed on the receiving end. We've reached out to both APs and unfortunately didn't get any helpful details on what could've been the solution. Now, with the AP we still get trouble, we have a temporary workaround to use an old oxalis 4.1.1 that can still deliver messages to them. We were also able to reproduce the same issue in a test environment (using SMK) but haven't reached any conclusion on what is the issue or how we can solve it. Also, when they deployed Oxalis 6.5.0 in the test environment, the reported error changed to 'Code: EBMS:0004, Message: EBMS:0004 Other PEPPOL:NOT_SERVICED' instead of 'javax.xml.crypto.dsig.TransformException: java.io.IOException: javax.crypto.AEADBadTagException: Tag mismatch!' were saw previously. When checking the difference between 6.4.0 and 6.5.0 and clear that it's just the extra error handling that was added, but it doesn't really help to track down the issue. |
hi, I am attaching the source of the above finding: |
@Praedo4 : If reported error changed to 'Code: EBMS:0004, Message: EBMS:0004 Other PEPPOL:NOT_SERVICED' then it is certain that they are Not using right certificate. There is some problem with validation of certificate. In general, EBMS_0009 is generic type of error thrown with possible message. |
Thanks @karelkryda for sharing as it will be helpful for other especially those using AWS. With this, we also want to highlight the point that Oxalis user do not just "fire & forget"/"dump" issue in Github but they must continue investigating it further with difference perspective. It is difficult if not impossible to simulate all kind of external/environment factor to reproduce issue. |
Based on discussion so far, reported issue seems to be either linked with external entities or usage of certificate. This is Not an Oxalis issue. Do report back if you can prove otherwise. Note: We strongly recommend access point using Oxalis version prior than 6.x.x to upgrade to latest version otherwise you are Non-compliant as per OpenPeppol specifications. |
As per discussion so far, it seems that issue is due other environmental factors. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hi Arun and the Oxalis team,
I'm posting an issue here since I'm not aware of better way to communicate to ask a question.
Our Peppol AP is recently started having failed communication in production with other APs that are running oxalis (at least two different APs so far). Nothing has changed on our side and we still have no problems with the rest of APs. We are not using oxalis on our side, but the issue is reported from the receiving party that does run oxalis. The error we get back is 'EBMS:0009'. Error details below.
This seems to be related to the digital signature verification, could it be related to #200 ? Maybe we can try to reproduce it on a test environment. I think our support department already reached out to the other APs, but it would be good to know if you can assist them/us to resolve this issue asap.
I also mention in the title that this is potentially only the case for the latest version, because at least one of the APs we have this issue, we know recently upgraded to the most recent version of oxalis (6.4.0 to our knowledge). Also, we know that plenty of APs that run oxalis and this issue is not observed.
The text was updated successfully, but these errors were encountered: