Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diarization Output Modified #1586

Merged
merged 13 commits into from
Jul 20, 2018
Merged

Diarization Output Modified #1586

merged 13 commits into from
Jul 20, 2018

Conversation

happyhuman
Copy link
Contributor

Printing the last paragraph only.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 19, 2018
@happyhuman
Copy link
Contributor Author

happyhuman commented Jul 19, 2018

@dizcology , @tswast , @sirtorry I had to adjust the output in this sample a little. Can you please review this PR? Thanks.

words_info = result.alternatives[0].words
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag)
for word_info in words_info]
print(' '.join(pieces))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this might not illustrate the "typical" use case, where the developer might more likely want to group and join the words according to their speaker_tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point. Tough to say what the right use case is. But I see it just as a sample. to show them the API, and not the use case. Do you think we can keep it as is, or should we change it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - in that sense perhaps let's just iterate through words_info and print everything without the nice formatting of '%s (%s)'. the expected output of the test really confused me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with the way it's formatted but whatever you decide consider this syntax

pieces = ['{} ({})'.format(word_info.word, word_info.speaker_tag) for word_info in words_info]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to make the output to look like this:

Speaker #1: I'm here
Speaker #2: hi I'd like to buy a Chrome Cast and I was wondering whether you could help me

.format(i, alternative.transcript))
print('Speaker Tag for the first word: {}'
.format(alternative.words[0].speaker_tag))
result = response.results[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here explaining why you're only taking the last result (instead of all of them) would probably be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Adding it.

@@ -46,7 +46,6 @@ def transcribe_file_with_enhanced_model(speech_file):
audio = speech.types.RecognitionAudio(content=content)
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for omitting this? For WAV files, the API can infer this, but in the general case it's probably a good idea to include the sample rate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep going back and forth about it. Sometimes it is useful, and other times it is causing an error when the input file has a different sample rate.

.format(alternative.words[0].speaker_tag))
result = response.results[-1]
words_info = result.alternatives[0].words
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't '{} ({})'.format(word_info.word, word_info.speaker_tag) the preferred way to do this these days? Or is Thea no longer benevolent overlord?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will modify it.

@@ -46,7 +46,6 @@ def transcribe_file_with_enhanced_model(speech_file):
audio = speech.types.RecognitionAudio(content=content)
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is sample_rate removed? I am sure there is good reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just had a discussion about it with Roy and Jerjou. If the input file has a different sample rate, it will cause an error. It is simpler just to omit it and the API figures it out on its own.

@@ -51,10 +51,10 @@ def test_transcribe_file_with_auto_punctuation(capsys):

def test_transcribe_diarization(capsys):
transcribe_file_with_diarization(
os.path.join(RESOURCES, 'Google_Gnome.wav'))
os.path.join(RESOURCES, 'commercial_mono.wav'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding file name as an argument and not hardcode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for the unit tests?

words_info = result.alternatives[0].words
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag)
for word_info in words_info]
print(' '.join(pieces))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with the way it's formatted but whatever you decide consider this syntax

pieces = ['{} ({})'.format(word_info.word, word_info.speaker_tag) for word_info in words_info]

.format(alternative.words[0].speaker_tag))
result = response.results[-1]
words_info = result.alternatives[0].words
pieces = ['%s (%s)' % (word_info.word, word_info.speaker_tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pieces is an odd variable name.

@@ -156,21 +153,18 @@ def transcribe_file_with_diarization(speech_file):

config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)

print('Waiting for operation to complete...')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using python logging facility. Understandably, for this sample it might be overkill so take it or leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW nearly all our other Python samples do print(). It's true that it's not always the recommended practice in production, but it's easy to understand. With logging there's always the risk that the developer has some weird config where the logs end up where maybe they don't expect.

@happyhuman
Copy link
Contributor Author

@dizcology , @puneith, I made some changes based on the comments made by all the reviewers? Can you please take a final look and approve the PR if it looks good?

print('Speaker Tag for the first word: {}'
.format(alternative.words[0].speaker_tag))
# response.results contains partial results with the last item
# containing the entire result:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm... not quite. The transcript within each result is separate and sequential per result. However, the words list within an alternative (for whatever reason) includes all the words from all the results thus far. Thus, to get all the words with speaker tags, you only have to take the words list from the last result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the clarification. Let me update the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really understanding your comment, @jerjou but this sounds like something that needs to be documented on the cloud.google.com docs with a briefer explanation in the sample itself.

if speakers_words and speakers_words[-1][0] == word_info.speaker_tag:
speakers_words[-1][1].append(word_info.word)
else:
speakers_words.append((word_info.speaker_tag, [word_info.word, ]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hard to read. An intermediate variable and a namedtuple would go a long way to making this clearer:

Speaker = collections.namedtuple('Speaker', ['tag', 'words'])
speaker_words = [Speaker(tag=0, words=[])]
for word_info in words_info:
  current_speaker = speaker_words[-1]
  if current_speaker.tag == word_info.speaker_tag:
    current_speaker.words.append(word_info.word)
  else:
    speaker_words.append(Speaker(tag=word_info.speaker_tag, words=[word_info.word]))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe speaker_sequence or something, to make it clear it's not just a speaker->words_they_spoke mapping, and is actually the conversation / words spoken in sequence.

Copy link
Contributor Author

@happyhuman happyhuman Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. However, I think the code readability is mainly the result of the intermediate variable Speaker(tag=0, words=[]) as the first element in the list, allowing us to define current_speaker every time.
The downside of it is that we are introducing a new object that is not returned by the API and we will have to handle it in the next step separately (either by removing it, or by skipping it), which reduces the code cleanness in another way.
So, while I am not against the suggested solution, I am also not sure if it is really helping with the code readability that much.
What do you think @jerjou ?

Copy link
Contributor Author

@happyhuman happyhuman Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this:

    speakers = []
    words = []
    for word_info in words_info:
        if (not speakers) or speakers[-1] != word_info.speaker_tag:
            speakers.append(word_info.speaker_tag)
            words.append([])
        words[-1].append(word_info.word)

I think this is more readable that the current code, without introducing the intermediate variable.

elif args.command == 'multi-language':
transcribe_file_with_multilanguage(args.path, args.first, args.second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, argparse's sub-commands feature would be helpful to avoid having args that only matter for one command or another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

# Separating the words by who said what:
speakers_words = []
for word_info in words_info:
if speakers_words and speakers_words[-1][0] == word_info.speaker_tag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly what this piece of for loop is doing (creating list of words for tag) isn't is better to use hashmap. Does this loop work what it's supposed to do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jerjou is already reviewing this code so I should stop reviewing. You are already is great hands :)

@@ -156,21 +153,18 @@ def transcribe_file_with_diarization(speech_file):

config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)

print('Waiting for operation to complete...')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW nearly all our other Python samples do print(). It's true that it's not always the recommended practice in production, but it's easy to understand. With logging there's always the risk that the developer has some weird config where the logs end up where maybe they don't expect.

print('Speaker Tag for the first word: {}'
.format(alternative.words[0].speaker_tag))
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative (for whatever reason)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this comment is to explain why the [-1], but I'm having trouble understanding what this is referring to. (Probably because I don't know what an "alternative" means in this context.)

I think you could probably get by with fewer words within the sample with just

# To get all words with speaker tags, you only have to take the words list from the last result.

and leave the extra explanation about alternatives for the actual cloud.google.com docs.

Copy link
Contributor Author

@happyhuman happyhuman Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had a shorter comment. But Jerjou suggested to explain it more, so I added the longer comment. I am okay either way, but I also think this is just a sample and not the formal documentation and the sample does not need to go over all the details of what's what.

speakers = []
words = []
for word_info in words_info:
if (not speakers) or speakers[-1] != word_info.speaker_tag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation means that if speakers appear in certain orders you could end up with duplicates. For example.

word_info speakers: A, B, B, A would result in a speakers lists: A, B, A.

I think this sample would be clearer with a single dictionary of speakers to words rather than two lists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a long discussion about the code readability of this part of the sample. The API returns each word with an associated speak_tag. So, if I am speaker #1 and I say "I am ok", the words 'I' and 'am' and 'okay' will all be tagged with speaker_tag: 1.

Since the point of the sample is mostly just to showcase the API call and how to unbox the response object, I am beginning to think that this particular implementation is adding more confusion than clarity. I am leaning back toward a simpler implementation that just iterates over words and prints each word along with the associated speaker_tag. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, printing each word with a speaker tag would be much clearer.

You could do something like

current_speaker = None
for word_info in words_info:
    if current_speaker is None or current_speaker != word_info.speaker_tag:
        current_speaker = word_info.speaker_tag
        print()
        print(current_speaker)
    print(word_info.word)

if you wanted to keep them grouped.

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new loop for speaker tags is much easier to understand, thanks.

@happyhuman happyhuman dismissed puneith’s stale review July 20, 2018 23:20

Talked to Puneith offline about it. There is no issue to be resolved.

@happyhuman happyhuman merged commit c310941 into master Jul 20, 2018
@happyhuman happyhuman deleted the diarization-fix branch July 20, 2018 23:20
@jerjou
Copy link
Contributor

jerjou commented Jul 31, 2018

Shoot. I totally dropped off this PR, didn't I. Sorry - bad email management -_-; Glad y'all just went on without me.

busunkim96 pushed a commit to googleapis/python-texttospeech that referenced this pull request Apr 9, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
busunkim96 pushed a commit to googleapis/python-texttospeech that referenced this pull request May 13, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
busunkim96 pushed a commit to googleapis/python-texttospeech that referenced this pull request May 13, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
busunkim96 pushed a commit to googleapis/python-texttospeech that referenced this pull request May 13, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
busunkim96 pushed a commit to googleapis/python-texttospeech that referenced this pull request May 20, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
busunkim96 pushed a commit to googleapis/python-texttospeech that referenced this pull request May 20, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
busunkim96 pushed a commit to busunkim96/python-speech that referenced this pull request Sep 1, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
busunkim96 pushed a commit to googleapis/python-speech that referenced this pull request Sep 3, 2020
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
msampathkumar pushed a commit that referenced this pull request Nov 10, 2022
* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
msampathkumar pushed a commit that referenced this pull request Nov 10, 2022
* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
telpirion pushed a commit that referenced this pull request Jan 13, 2023
* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
dandhlee pushed a commit that referenced this pull request Feb 9, 2023
* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
telpirion pushed a commit that referenced this pull request Mar 13, 2023
* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
atulep pushed a commit to googleapis/google-cloud-python that referenced this pull request Apr 6, 2023
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
atulep pushed a commit to googleapis/google-cloud-python that referenced this pull request Apr 6, 2023
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
atulep pushed a commit to googleapis/google-cloud-python that referenced this pull request Apr 18, 2023
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
parthea pushed a commit to googleapis/google-cloud-python that referenced this pull request Sep 22, 2023
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
parthea pushed a commit to googleapis/google-cloud-python that referenced this pull request Sep 22, 2023
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
parthea pushed a commit to googleapis/google-cloud-python that referenced this pull request Oct 22, 2023
…-samples#1586)

* Printing the last paragraph only.

* Python3 print

* Removing sample rate setting

* Adding the missing output parameter in the example

* Changes based on the comments

* Removed filenames as input parameters

* Removed unused args

* Updated README file

* Updated the inline comment

* Modified code to make it more readable

* Simplified the response object processing.

* Fixing the long line issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants