Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output matching different between windows and linux. I created a database of filehashes which is around 340MB. When i try to query around 100 mp3 files my output is different between my windows machine and my raspberri pi. The database is identical, the query is identical but the windows machine finds significantly more matches. Both are running python 3.9. Database was created on the windows machine and transfered to the pi. Anyone encountered something similar? #92

Open
CoenGoedhart1 opened this issue Nov 6, 2022 · 7 comments

Comments

@CoenGoedhart1
Copy link

No description provided.

@dpwe
Copy link
Owner

dpwe commented Nov 6, 2022 via email

@CoenGoedhart1
Copy link
Author

Hi, thanks for responding. I did some tests. having the same query and database on both windows and linux. I tried with the original mp4 files, afpk files (the peaks) and the precomputed afpt files. On windows the mp4 query finds significantly more matches with the mp4 files than the aftp and afpk files. both afpk and afpt outputs are identical on windows and linux. Linux finds also siginificantly more matches with the mp4 comparing to aftp and afpk. However linux mp4 finds less matches than windows mp4. On both machines i installed the newest ffmpeg available. My goal is to find the same exact matches on my linux system as the matches i found on windows using the mp4 files.

"MP3 decoders (even different versions of mpg123) can differ in their net
delay, and an identically-timed query will get many more hits than a random
time alignment."

can i conclude from this that a possible solution could be to build up the entire database on the linux system since creating the database on a different platform could interfere the delay?

@dpwe
Copy link
Owner

dpwe commented Nov 7, 2022 via email

@dpwe
Copy link
Owner

dpwe commented Nov 7, 2022 via email

@CoenGoedhart1
Copy link
Author

I did some more testing. I now have 3 systems, 1 winsows, 1 pi with linux, 1 virtual linux docker system. I created a database using the pi and have 585 queries with audio downloaded from a different source. In the database are the original videos and the queries are tweets, so it's not the original files. I found out that it does not matter on which system you create the database. I created one on linux and one on windows. when using the one on the other system results stay consistent. So, adding mp4 files to the database is for all platforms the same. Only the reuslts of the query are different. The docker and pi outputs exactly the same. The windows system finds significantly more matches.

I also tried to converting to wav.

On the windows machine i did the following mpeg command:
for %i in (*.mp4) do ffmpeg -i "%i" "%~ni.wav"
when querying windows had the same results but linux did far worse then the mp4 files.

On the linux machine i also converted the same mp4 files to wav:
for i in .mp4; do ffmpeg -i "$i" "${i%.}.wav"; done
transfering to windows and querying again gave the same good results, on linux the same bad results.

I think it can only be ffmpeg right? im not sure how to investigate further

@dpwe
Copy link
Owner

dpwe commented Nov 8, 2022 via email

@CoenGoedhart1
Copy link
Author

Hi DAn, thank you so much! I did manage to recreate the results of the windows machine on the raspberri pi. Converting the mp4 files to wav with the sample rate of 11025 and having FFMPEG on False on the audio_read.py gave me the exact same results. Im looking into using the same decoder on both platforms but for now i have a workable solution! Again, thnx for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants