The client is a B2B company serving huge firms by building and customizing prototypes. They are aimed at developing innovative solutions with the goal of helping each of their clients. 

3 months of cooperation | IT

Tools Used:

Deep Learning | Python | Jupyter Notebook 


The project’s main objective was for a clear “speech-to-speech translation” and the client was not clear which was creating trouble for speech-to-speech translation of English & Chinese.

The data available there was very limited which was not sufficient for all the required languages. Complete data pre-processing was required for converting .mp3 to .wav format.

One of the important issues is to convert everything which was very difficult to do it manually. The audio files needed to be converted to the Mel Spectrogram which is a part of data visualization.


To make everything efficient we collected data for proper Chinese to English translation. It was collected from two different GIT repositories. After collecting all the data we merged it into a single dataset. 

To achieve the Mel Spectrogram for all the audio sourcing we used FFmpeg and librosa libraries. It helped us in automated speech recognition. And by pulling out the data from it we were able to achieve our goal.

After merging it into a single dataset & audio sourcing it helped the client understand better as the data was clean and customized according to our client’s needs.