OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Using STT model in local GPU computer with executing time too long

  • Thread starter Thread starter Dinh Truong Anh Phuong
  • Start date Start date
D

Dinh Truong Anh Phuong

Guest
I use the below script to Speech To Text in google colab, running time around 5min for the uploaded record

Code:
    model_dir = drive_path
# Initialize the pipeline
pipe = pipeline("automatic-speech-recognition", 
                model=WhisperForConditionalGeneration.from_pretrained(model_dir), 
                tokenizer= WhisperTokenizer.from_pretrained(model_dir),
                feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir),
                chunk_length_s=5,
)

def transcribe(audio):
    text = pipe(audio)["text"]
    return text

audio_path = google_drive_audio_path
print(transcribe(audio_path))

With the same code, I use in my computer, it just printted out the last content in the audio and the running time is 7min. I tried to chunk it, but I had to wait 30 minutes without any result, just print out blank space

Code:
def process_chunk(chunk):
    # Convert pydub AudioSegment to numpy array
    samples = np.array(chunk.get_array_of_samples(), dtype=np.float32)
    # Normalize audio
    samples = samples / np.iinfo(chunk.array_type).max
    print(pipe(samples)['text'])
    return pipe(samples)["text"]

def transcribe(audio_path):
    # Split audio into chunks
    audio = AudioSegment.from_file(audio_path)
    chunk_length_ms = 5000  # 5 seconds in milliseconds
    chunks = make_chunks(audio, chunk_length_ms)
    
    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Process chunks in parallel
        results = list(executor.map(process_chunk, chunks))
    
    # Concatenate all the text from the chunks
    text = " ".join(results)
    return text

my computer doesnt have GPU, I just use CPU, but google colab I just use CPU too.

My question:

  1. How can I change my code to get the whole content?
  2. Why running code in google colab can get the whole content without any chunk size, but the code when running in local just get the last line?

Because my audio has personal information, I cannot upload it to google drive
<p>I use the below script to Speech To Text in google colab, running time around 5min for the uploaded record</p>
<pre><code> model_dir = drive_path
# Initialize the pipeline
pipe = pipeline("automatic-speech-recognition",
model=WhisperForConditionalGeneration.from_pretrained(model_dir),
tokenizer= WhisperTokenizer.from_pretrained(model_dir),
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir),
chunk_length_s=5,
)

def transcribe(audio):
text = pipe(audio)["text"]
return text

audio_path = google_drive_audio_path
print(transcribe(audio_path))
</code></pre>
<p>With the same code, I use in my computer, it just printted out the last content in the audio and the running time is 7min.
I tried to chunk it, but I had to wait 30 minutes without any result, just print out blank space</p>
<pre><code>def process_chunk(chunk):
# Convert pydub AudioSegment to numpy array
samples = np.array(chunk.get_array_of_samples(), dtype=np.float32)
# Normalize audio
samples = samples / np.iinfo(chunk.array_type).max
print(pipe(samples)['text'])
return pipe(samples)["text"]

def transcribe(audio_path):
# Split audio into chunks
audio = AudioSegment.from_file(audio_path)
chunk_length_ms = 5000 # 5 seconds in milliseconds
chunks = make_chunks(audio, chunk_length_ms)

with concurrent.futures.ThreadPoolExecutor() as executor:
# Process chunks in parallel
results = list(executor.map(process_chunk, chunks))

# Concatenate all the text from the chunks
text = " ".join(results)
return text
</code></pre>
<p>my computer doesnt have GPU, I just use CPU, but google colab I just use CPU too.</p>
<p>My question:</p>
<ol>
<li>How can I change my code to get the whole content?</li>
<li>Why running code in google colab can get the whole content without any chunk size, but the code when running in local just get the last line?</li>
</ol>
<p>Because my audio has personal information, I cannot upload it to google drive</p>
 

Latest posts

I
Replies
0
Views
1
impact christian
I
Top