I'm not much into audio engineering, so please be easy on me. I'm receiving an audio file as input, and need to detect whether the speaker is male or female. Any ideas how to go about doing this?
I'm using php, but am open to using other languages, and don't mind learning a little bit of sound theory as long as the time is proportionate to the task.
I can't really provide specific insight to this problem , but I'd start by reading the following article: Gender Classification from Speech.
That should at least give an idea of the concepts / methodologies involved (this article describes this quite well as far as I can tell).
First of all you will have to find pitch values and one great algorithm for finding pitch values for voice can be find on this article: http://www.fon.hum.uva.nl/paul/papers/Proceedings_1993.pdf .
It's amazingly accurate.
I'm with Christophe, both in that I don't have too much experience with this and also think some research would be your best path.
If I had to take a stab at this though, I would guess that it would involve computing the frequency spectrum of the sample using Fourier transforms, and then figuring out where the mean frequency lay. Build up a large sample of male vs female, for different cultures and languages, and then compare your specific sample's mean frequency to established means for male vs female.
I could be completely wrong though, so research is really your best bet.
One approach would be to use artificial neural networks. You provide the neural net with some examples for training and it should hopefully learn to correctly classify the voices. You will probably have to do some feature extraction using Fourier transforms to get the data into a suitable form.
There are several papers about this kind of approach if you search on Google for "neural network speaker identification" but unfortunately I am not familiar enough with them to recommend any particular one.
Related
I have an arduino grabbing outside light level from an LDR at 1 minute intervals. I am therefore storing the the data for each day as a time series dataset with a percentage light and a timestamp. The data looks like below (although is stored in a mysql db) and produces this graph:
{"Timestamp":"2017-03-22 14:48:48","ExternalLight":"99.5"},{"Timestamp":"2017-03-22 14:47:46","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:46:44","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:45:42","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:44:40","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:43:38","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:42:36","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:41:34","ExternalLight":"99.7"},{"Timestamp":"2017-03-22 14:40:32","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:39:30","ExternalLight":"99.5"},{"Timestamp":"2017-03-22 14:38:28","ExternalLight":"99.5"},{"Timestamp":"2017-03-22 14:37:26","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:36:24","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:35:22","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:34:20","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:33:18","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:32:16","ExternalLight":"99.7"},{"Timestamp":"2017-03-22 14:31:14","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:30:12","ExternalLight":"99.5"},
.......
I am looking for the most efficient way to identify the two specific changes - where it gets light in the morning, and where it gets dark in the evening. Would it be possible to do this using a MySQL query? Or will I need to select all of the data and process it using PHP? I am not really sure the best way to start so I am looking for some guidence!
Many thanks,
Chris
The short answer is that this is a data analysis problem and neither MySQL nor PHP are good fits. I generally wouldn't suggest trying to do something like this in PHP. I seriously doubt it is even possible in MySQL. A language that is designed more for data analysis and processing would work much better. Personally, I use python for these kinds of tasks, which has excellent tools like numpy/scipy/matplotlib. What did you make your plot in? If that is an actual programming language, that might be a good choice.
The thing to do is to figure out the algorithm that you will use to measure these things, and then figure out how to implement that algorithm in your language of choice. The reality is that the question you are asking is much more complicated than it might seem at first glance. Taking a time series and building a algorithm to reliably extract information like "sunrise" and "sunset" can be surprisingly involved, especially when you account for things like variable weather conditions, etc. This is probably a fairly straight-forward problem to start with, but what you really need (from the sounds of it), is not help selecting a technology (i.e. MySQL or PHP), but help building an actual algorithm. This may or may not be the best place for that. Have you done any data analysis before?
I'm currently building a website which will aggregate news articles and then categorise them based on their content. However, I would like to analyse the times in which the articles are published so that I can determine if their is some sort of trend occuring, thus allowing me to predict the time when the next article is likely to be published and also eliminating the need to for frequent / unnecessary crawling attempts.
I've had a look on the internet and it seems that neural networks can be used for analysing time series', however I haven't found any examples / code snippets that are benefical or easy for a beginner to understand and then adapt.
These articles also seem to suggest that the inputs to a neural network should either be a 0 or a 1, therefore how would you go about creating a neural network in PHP that has the following inputs i.e. unix timestamps and the capabilitiess to output a single value?
1332193520
1342194916
1342196716
1342197376
1352197856
1362198756
Any help would be greatly appreciated.
It would be hard to train a NN in PHP. This should be done offline using some fast non-scripting language. Oh, implementation, training and especially tuning of NNs parameters is not a trivial task.
If I were you I would go for Linear SVMs that outperform other methods for text classification and are quite simple in deployment (and elegant theoretically - Btw, NNs are old fashion). There are excellent implementation of SVMs like
SVM Light and LIBSVM written in multiple languages.
I'm trying to nest material with the least drop or waste.
Table A
Qty Type Description Length
2 W 16x19 16'
3 W 16x19 12'
5 W 16x19 5'
2 W 5x9 3'
Table B
Type Description StockLength
W 16X19 20'
W 16X19 25'
W 16X19 40'
W 5X9 20'
I've looked all over looking into Greedy Algorithms, Bin Packing, Knapsack, 1D-CSP, branch and bound, Brute force, and others. I'm pretty sure it is a Cutting stock problem. I just need help coming up with the function(s) to run this. I don't just have one stock length but multiple and a user may enter his own inventory of less common lengths. Any help at figuring a function or algorithm to use in PHP to come up with the optimized cutting pattern and stock lengths needed with the least waste would be greatly appreciated.
Thanks
If your question is "gimme the code", I am afraid that you have not given enough information to implement a good solution. If you read the whole of this answer, you will see why.
If your question is "gimme the algorithm", I am afraid you are looking for an answer in the wrong place. This is a technology-oriented site, not an algorithms-oriented one. Even though we programmers do of course understand algorithms (e.g., why it is inefficient to pass the same string to strlen in every iteration of a loop, or why bubble sort is not okay except for very short lists), most questions here are like "how do I use API X using language/framework Y?".
Answering complex algorithm questions like this one requires a certain kind of expertise (including, but not limited to, lots of mathematical ability). People in the field of operations research have worked in this kind of problems more than most of us ever has. Here is an introductory book on the topic.
As an engineer trying to find a practical solution to a real-world problem, I would first get answers for these questions:
How big is the average problem instance you are trying to solve? Since your generic problem is NP-complete (as Jitamaro already said), moderately big problem instances require the use of heuristics. If you are only going to solve small problem instances, you might be able to get away with implementing an algorithm that finds the exact optimum, but of course you would have to warn your users that they should not use your software to solve big problem instances.
Are there any patterns you could use to reduce the complexity of the problem? For example, do the items always or almost always come in specific sizes or quantities? If so, you could implementing a greedy algorithm that focuses on yielding high-quality solutions for common scenarios.
What would be your optimality vs. computational efficiency tradeoff? If you only need a good answer, then you should not waste mental or computational effort in trying to provide an optimal answer. Information, whether provided by a person of by a computer, is only useful if it is available when it is needed.
How much are your customers willing to pay for a high-quality solution? Unlike database or Web programming, which can be done by practically everyone because algorithms are kept to a minimum (e.g. you seldom code the exact procedure by which a SQL database provides the result of a query), operations research does require both mathematical and engineering skills. If you are not charging for them, you are losing money.
This looks to me like a variation of a 1d bin-packing. You may try a best-fit and then try it with different sorting of the table b. Anyway there doesn't exist an solution in 3/2 of the optimum and because this is a NP-complete problem. Here is a nice tutorial: http://m.developerfusion.com/article/5540/bin-packing. I used a lot to solve my problem.
What is the most efficient way to search through so many characters? What do you think?
Let's say website built in PHP and MySQL.
What should I learn to be able to build this as much efficiently as possible? Are there any algorythms I should learn or something?
Text indexing algorithm
Google uses a custom-made database solution called BigTable, http://en.wikipedia.org/wiki/Big_table, which is run linked over hundreds of servers all over the world. So they're fast because they wrote the software specifically to be fast, and set up the hardware in such a way that they could squeeze the most out of it.
You can get to a decent set with PHP and MySQL, but once you start dealing with very large data sets, MySQL, and any other generic database, will start to buckle under the stress. If you want to learn more about this, a good place to start is to do a search for concurrency in database design (briefly explained in http://en.wikipedia.org/wiki/Concurrency_control amongst others), which is a topic way too large to cover in a stackoverflow reply =)
Google goes beyond simply optimizing the databases and the code. They also do a lot of distributed programming. While the exact mechanisms they use to power systems such as Gmail are guarded secrets, it is known that they have entire farms of computers networked, each working on parts of the index at any given time, rather than just one server.
For MySQL, look at the Full-Text Search Functions.
This is assuming your content is stored in the database (such as in a CMS).
I want to build something similar to Tunatic or Midomi (try them out if you're not sure what they do) and I'm wondering what algorithms I'd have to use; The idea I have about the workings of such applications is something like this:
have a big database with several songs
for each song in 1. reduce quality / bit-rate (to 64kbps for instance) and calculate the sound "hash"
have the sound / excerpt of the music you want to identify
for the song in 3. reduce quality / bit-rate (again to 64kbps) and calculate sound "hash"
if 4. sound hash is in any of the 2. sound hashes return the matched music
I though of reducing the quality / bit-rate due to the environment noises and encoding differences.
Am I in the right track here? Can anyone provide me any specific documentation or examples? Midori seems to even recognize hum's, that's pretty awesomely impressive! How do they do that?
Do sound hashes exist or is it something I just made up? If they do, how can I calculate them? And more importantly, how can I check if child-hash is in father-hash?
How would I go about building a similar system with Python (maybe a built-in module) or PHP?
Some examples (preferably in Python or PHP) will be greatly appreciated. Thanks in advance!
I do research in music information retrieval (MIR). The seminal paper on music fingerprinting is the one by Haitsma and Kalker around 2002-03. Google should get you it.
I read an early (really early; before 2000) white paper about Shazam's method. At that point, they just basically detected spectrotemporal peaks, and then hashed the peaks. I'm sure that procedure has evolved.
Both of these methods address music similarity at the signal level, i.e., it is robust to environment distortions. I don't think it works well for query-by-humming (QBH). However, that is a different (yet related) problem with different (yet related) solutions, so you can find solutions in the literature. (Too many to name here.)
The ISMIR proceedings are freely available online. You can find valuable stuff there: http://www.ismir.net/
I agree with using an existing library like Marsyas. Depends on what you want. Numpy/Scipy is indispensible here, I think. Simple stuff can be written in Python on your own. Heck, if you need stuff like STFT, MFCC, I can email you code.
I worked on the periphery of a cool framework that implements several Music Information Retrieval techniques. I'm hardly an expert (edit: actually i'm nowhere close to an expert, just to clarify), but I can tell that that the Fast Fourier Transform is used all over the place with this stuff. Fourier analysis is wacky but its application is pretty straight-forward. Basically you can get a lot of information about audio when you analyze it in the frequency domain rather than the time domain. This is what Fourier analysis gives you.
That may be a bit off topic from what you want to do. In any case, there are some cool tools in the project to play with, as well as viewing the sourcecode for the core library itself: http://marsyas.sness.net
I recently ported my audio landmark-based fingerprinting system to Python:
https://github.com/dpwe/audfprint
It can recognize small (5-10 sec) excerpts from a reference database of 10s of thousands of tracks, and is quite robust to noise and channel distortions. It uses combinations of local spectral peaks, similar to the Shazam system.
This can only match the exact same track, since it relies on fine details of frequencies and time differences - it wouldn't even match different takes, certainly not cover versions or hums. As far as I understand, Midomi/SoundHound works by matching hums to each other (e.g. via dynamic time warping), then has a set of human-curated links between sets of hums and the intended music track.
Matching a hum directly to a music track ("Query by humming") is an ongoing research problem in music information retrieval, but is still pretty difficult. You can see abstracts for a set of systems evaluated last year at the MIREX 2013 QBSH Results.
MFCC extracted from the music is very useful in finding the timbrel similarity between songs.. this is most often used to find similar songs. As pointed by darren, Marsyas is a tool that can be used to extract MFCC and find similar songs by converting the MFCC in to a single vector representation..
Other than MFCC, Rhythm is also used to find song similarity.. There are few papers presented in the Mirex 2009
that will give you good overview of different algorithms and features that are most helpful in detecting music similarity.
The MusicBrainz project maintains such a database. You can make queries to it based on a fingerprint.
The project exists already since a while and has used different fingerprints in the past. See here for a list.
The latest fingerprint they are using is AcoustId. There is the Chromaprint library (also with Python bindings) where you can create such fingerprints. You must feed it raw PCM data.
I have recently written a library in Python which does the decoding (using FFmpeg) and provides such functions as to generate the AcoustId fingerprint (using Chromaprint) and other things (also to play the stream via PortAudio). See here.
Its been a while since i last did signal processing, but rather than downsampling you should look at frequency-domain representations (eg FFT or DCT). Then you could make a hash of sorts and search for the database song with that sequence in.
Tricky part is making this search fast (maybe some papers on gene search might be of interest). I suspect that iTunes also does some detection of instruments to narrow down the search.
I did read a paper about the method in which a certain music information retrieval service (no names mentioned) does it - by calculating the Short Time Fourier transform over the sample of audio. The algorithm then picks out 'peaks' in the frequency domain i.e. time positions and frequencies that are particularly high amplitude, and uses the time and frequency of these peaks to generate a hash. Turns out the hash has surprising few collisions between different samples, and also stands up against approx 50% data loss of the peak information.....
Currently I'm developing a music search engine using ActionScript 3. The idea is analyzing the chords first and marking the frames (it's limited to mp3 files at the moment) where the frequency changes drastically (melody changes and ignoring noises). After that I do the same thing to the input sound, and match the results with the inverted files. The matching one determines the matching song.
For Axel's method, I think you shouldn't worry about the query whether it's a singing or just humming, since you don't implement a speech recognition program. But I'm curious about your method which uses hash functions. Could you explain that to me?
For query by humming feature, it is more complicate than the audio fingerprinting solution, the difficult comes from:
how to efficiently collect the melody database in real world application? many demo system use midi to build-up, but midi solution's cost is extremely not affordable for a company.
how to deal with the time variance, for example, user hum may fast or slow. use DTW? yes, DTW is a very good solution for dealing with time series with time variance, BUT it cost too much CPU-load.
how to make time series index?
Here is an demo query by humming open source project, https://github.com/EmilioMolina/QueryBySingingHumming, could be an reference.