I was thinking about an idea of auto generated answers, well the answer would actually be a url instead of an actual answer, but that's not the point.
The idea is this:
On our app we've got a reporting module which basically show's page views, clicks, conversions, details about visitors like where they're from, - pretty much a similar thing to Google Analytics, but way more simplified.
And now I was thinking instead of making users select stuff like countries, traffic sources and etc from dropdown menu's (these features would be available as well) it would be pretty cool to allow them to type in questions which would result in a link to their expected part of the report. An example:
How many conversions I had from Japan on variant (one page can have many variants) 3.
would result in:
/campaign/report/filter/campaign/(current campaign id they're on)/country/Japan/variant/3/
It doesn't seem too hard to do it myself, but it's just that it would take quite a while to make it accurate enough.
I've tried google'ing but had no luck to find an existing script, so maybe you guys know anything alike to my idea that's open source and well reliable/flexible enough to suit my needs.
Thanks!
You are talking about natural language processing - an artificial intelligence topic. This can never be perfect, and eventually boils down to the system only responding to a finite number of permutations of one question.
That said, if that is fine with you - then you simply need to identify "tokens". For example,
how many - evaluate to count
conversations - evaluate to all "conversations"
from - apply a filter...
japan - ...using japan
etc.
Related
I have currently 100 stores each with separate database. I want to develop a web portal which will display all 100 stores and if someone wants to search he will get the products from 100 stores on this portal rather going to different store's website. i was using xml for this purpose but it taking too long to parse xml file and filter the records of each store according to search keyword. i was generating xml when any store add new or edit product record. And on the portal website i was just parsing these generated xml (using PHP) files.
Please guide me if there is any better solution other than xml parsing. Let me clear one thing that all these stores and portal are hosted on same server and using subdomain for each store.
Thanks in advance.
Use a single database. Distinguish between stores with a store table that you reference with a foreign key column in any table where it is relevant (which is likely only going to be the stock table).
My first piece of advice to you is this: HIRE SOMEONE! If you have 100 stores, you certainly have the cashflow to devote to some sort of development project that is managed and planned by a professional.
Secondly, you are looking for someone with extensive database skill and experience. Please take the time to hire the right person, then get out of their way and let them do the job. If there is one thing I have learned in my time in the business world, it's that the single greatest hindrance to a job well done by IT, is a "boss" that doesn't recognize when he needs to hand over the reins to someone else who knows better.
The best way to think of it is this... if you know nothing about databases, and you were to start TODAY to learn everything you'd need to learn to do it RIGHT, you could expend the equivalent of a few working years in doing that. Is it worth the loss of productivity for your business? Probably not. So pay a guy $60,000 to $80,000 depending on his capabilities, and have him do it for you. You get a final product that's far more certain to be done right and work well, and you get it sooner, so you can get a faster ROI.
As far as what technology to use? I'm not even going to try answering that... it's not really for you to know or decide. Hire the right person, and let them tell you what you need.
I think sphinxsearch fits to your requirement.
Sphinx is an open source full text search server which accepts input from various sources like mysql & xml.
IN your case you can use xml/mysql as input source for indexes.
Key point with sphinx is once your indexer is ready you search response will be very quick. You can update your indexes in real time (for new product added in system).
Hope this help.
~K
I have a unique problem, I need to pull specific attributes for every game that is being played every 5 minutes, the two main issues I have are:
Phrasing data from a website that displays it interactively i.e. MLB.com, ESPN, CBS Sports.
Finding a source that would perhaps show the box scores that are updated live and in a text format.
I have done significant Googling as well as looking at possible solutions for scraping data off of MLB and CBS Sports. I havn't had such luck, it's a bit difficult right now because I don't have any fresh data to play with however I've been looking for possible solutions and havn't came to any resolution.
To my knowledge there isn't an open database that I can query that contains live updates scores otherwise I could piggyback off of that or obtain a similar system.
check out this forum question on another site. Looks like there are a few out there that will allow you to get csv's of their data. Not sure how much of it could be automated.
http://ask.metafilter.com/120399/MLB-API
Another is http://www.baseball-reference.com/ I'm not sure if they do box scores but they have stats on all the players, games, etc. They might have something you can use as well.
Finally you could check out http://www.strat-o-matic.com/ they might have something or be willing to create an API for you.
If you notice on Yahoo, they get their stats from STATS LLC. I have no idea what it costs, but you should check-out their real-time data delivery service.
Scrape the MLB gameday server. It is updated in realtime during games. If you want the boxscore, scrape boxscore.xml (for example)
I'm planning on integrating a reasonable ranking/voting system into an existing application.
I'm familiar with how the traditional 5 star rating systems work and know the common pitfalls/problems associated with them therefore was wondering if there is other ways (I've heard of Wilsons, Bayesian etc. but not really sure on how to implement this with the below structure):
I'm planning on allowing users to vote on content between 1 to 10 via the contents page.
The score and total votes for that content will be displayed on the contents page.
I will also be displaying/listing the Top 10 Content so I'd need the method to be fair/realistic and not make a vote of 10 with total votes of 1 to go straight to number 1.
I'm using PHP and MySQL, I have a table for the content (which has a content_id which I guess I can JOIN on).
I'm wondering if you can suggest a way/method which achieves the above, I'd appreciate if you can attach some example PHP code and example MySQL schema so I can better understand it, as I've google'd and may have found potential solutions such as Wilsons and Bayesian...yet they provide a lengthy article with confusing mathematical equations - and mention no way which achieves the above (ie. the score....and implenting the method in PHP/MySQL) or atleast due to there not being any example PHP/MySQL code me misunderstanding this.
Perhaps this is easier then I think - I don't know as I've never had the need to implement this sort of "more complex" ranking/voting functionality before - so I'd appreciate your responses.
You should start by watching this video on youtube : Building Web Reputation Systems.
To emphasize the point, let me direct you to XKCD.
As for DB structure, you need following parts:
list of items ( with total_votes column )
list of user, which have voted
intersection table for the items-users ( with rating column, if you go with 5star thing )
I'm a hobbyist, and started learning PHP last September solely to build a hobby website that I had always wished and dreamed another more competent person might make.
I enjoy programming, but I have little free time and enjoy a wide range of other interests and activities.
I feel learning PHP alone can probably allow me to create 98% of the desired features for my site, but that last 2% is awfully appealing:
The most powerful tool of the site is an advanced search page that picks through a 1000+ record game scenario database. Users can data-mine to tremendous depths - this advanced page has upwards of 50 different potential variables. It's designed to allow the hardcore user to search on almost any possible combination of data in our database and it works well. Those who aren't interested in wading through the sea of options may use the Basic Search, which is comprised of the most popular parts of the Advanced search.
Because the advanced search is so comprehensive, and because the database is rather small (less than 1,200 potential hits maximum), with each variable you choose to include the likelihood of getting any qualifying results at all drops dramatically.
In my fantasy land where I can wield AJAX as if it were Excalibur, my users would have a realtime Total Results counter in the corner of their screen as they used this page, which would automatically update its query structure and report how many results will be displayed with the addition of each variable. In this way it would be effortless to know just how many variables are enough, and when you've gone and added one that zeroes out the results set.
A somewhat similar implementation, at least visually, would be the Subtotal sidebar when building a new custom computer on IBuyPower.com
For those of you actually still reading this, my question is really rather simple:
Given the time & ability constraints outlined above, would I be able to learn just enough AJAX (or whatever) needed to pull this one feature off without too much trouble? would I be able to more or less drop-in a pre-written code snippet and tweak to fit? or should I consider opening my code up to a trusted & capable individual in the future for this implementation? (assuming I can find one...)
Thank you.
This is a great project for a beginner to tackle.
First I'd say look into using a library like jquery (jquery.com). It will simplify the javascript part of this and the manual is very good.
What you're looking to do can be broken down into a few steps:
The user changes a field on the
advanced search page.
The user's
browser collects all the field
values and sends them back to the
server.
The server performs a
search with the values and returns
the number of results
The user's
browser receives the number of
results and updates the display.
Now for implementation details:
This can be accomplished with javascript events such as onchange and onfocus.
You could collect the field values into a javascript object, serialize the object to json and send it using ajax to a php page on your server.
The server page (in php) will read the json object and use the data to search, then send back the result as markup or text.
You can then display the result directly in the browser.
This may seem like a lot to take in but you can break each step down further and learn about the details bit by bit.
Hard to answer your question without knowing your level of expertise, but check out this short description of AJAX: http://blog.coderlab.us/rasmus-30-second-ajax-tutorial
If this makes some sense then your feature may be within reach "without too much trouble". If it seems impenetrable, then probably not.
I want to build something similar to Tunatic or Midomi (try them out if you're not sure what they do) and I'm wondering what algorithms I'd have to use; The idea I have about the workings of such applications is something like this:
have a big database with several songs
for each song in 1. reduce quality / bit-rate (to 64kbps for instance) and calculate the sound "hash"
have the sound / excerpt of the music you want to identify
for the song in 3. reduce quality / bit-rate (again to 64kbps) and calculate sound "hash"
if 4. sound hash is in any of the 2. sound hashes return the matched music
I though of reducing the quality / bit-rate due to the environment noises and encoding differences.
Am I in the right track here? Can anyone provide me any specific documentation or examples? Midori seems to even recognize hum's, that's pretty awesomely impressive! How do they do that?
Do sound hashes exist or is it something I just made up? If they do, how can I calculate them? And more importantly, how can I check if child-hash is in father-hash?
How would I go about building a similar system with Python (maybe a built-in module) or PHP?
Some examples (preferably in Python or PHP) will be greatly appreciated. Thanks in advance!
I do research in music information retrieval (MIR). The seminal paper on music fingerprinting is the one by Haitsma and Kalker around 2002-03. Google should get you it.
I read an early (really early; before 2000) white paper about Shazam's method. At that point, they just basically detected spectrotemporal peaks, and then hashed the peaks. I'm sure that procedure has evolved.
Both of these methods address music similarity at the signal level, i.e., it is robust to environment distortions. I don't think it works well for query-by-humming (QBH). However, that is a different (yet related) problem with different (yet related) solutions, so you can find solutions in the literature. (Too many to name here.)
The ISMIR proceedings are freely available online. You can find valuable stuff there: http://www.ismir.net/
I agree with using an existing library like Marsyas. Depends on what you want. Numpy/Scipy is indispensible here, I think. Simple stuff can be written in Python on your own. Heck, if you need stuff like STFT, MFCC, I can email you code.
I worked on the periphery of a cool framework that implements several Music Information Retrieval techniques. I'm hardly an expert (edit: actually i'm nowhere close to an expert, just to clarify), but I can tell that that the Fast Fourier Transform is used all over the place with this stuff. Fourier analysis is wacky but its application is pretty straight-forward. Basically you can get a lot of information about audio when you analyze it in the frequency domain rather than the time domain. This is what Fourier analysis gives you.
That may be a bit off topic from what you want to do. In any case, there are some cool tools in the project to play with, as well as viewing the sourcecode for the core library itself: http://marsyas.sness.net
I recently ported my audio landmark-based fingerprinting system to Python:
https://github.com/dpwe/audfprint
It can recognize small (5-10 sec) excerpts from a reference database of 10s of thousands of tracks, and is quite robust to noise and channel distortions. It uses combinations of local spectral peaks, similar to the Shazam system.
This can only match the exact same track, since it relies on fine details of frequencies and time differences - it wouldn't even match different takes, certainly not cover versions or hums. As far as I understand, Midomi/SoundHound works by matching hums to each other (e.g. via dynamic time warping), then has a set of human-curated links between sets of hums and the intended music track.
Matching a hum directly to a music track ("Query by humming") is an ongoing research problem in music information retrieval, but is still pretty difficult. You can see abstracts for a set of systems evaluated last year at the MIREX 2013 QBSH Results.
MFCC extracted from the music is very useful in finding the timbrel similarity between songs.. this is most often used to find similar songs. As pointed by darren, Marsyas is a tool that can be used to extract MFCC and find similar songs by converting the MFCC in to a single vector representation..
Other than MFCC, Rhythm is also used to find song similarity.. There are few papers presented in the Mirex 2009
that will give you good overview of different algorithms and features that are most helpful in detecting music similarity.
The MusicBrainz project maintains such a database. You can make queries to it based on a fingerprint.
The project exists already since a while and has used different fingerprints in the past. See here for a list.
The latest fingerprint they are using is AcoustId. There is the Chromaprint library (also with Python bindings) where you can create such fingerprints. You must feed it raw PCM data.
I have recently written a library in Python which does the decoding (using FFmpeg) and provides such functions as to generate the AcoustId fingerprint (using Chromaprint) and other things (also to play the stream via PortAudio). See here.
Its been a while since i last did signal processing, but rather than downsampling you should look at frequency-domain representations (eg FFT or DCT). Then you could make a hash of sorts and search for the database song with that sequence in.
Tricky part is making this search fast (maybe some papers on gene search might be of interest). I suspect that iTunes also does some detection of instruments to narrow down the search.
I did read a paper about the method in which a certain music information retrieval service (no names mentioned) does it - by calculating the Short Time Fourier transform over the sample of audio. The algorithm then picks out 'peaks' in the frequency domain i.e. time positions and frequencies that are particularly high amplitude, and uses the time and frequency of these peaks to generate a hash. Turns out the hash has surprising few collisions between different samples, and also stands up against approx 50% data loss of the peak information.....
Currently I'm developing a music search engine using ActionScript 3. The idea is analyzing the chords first and marking the frames (it's limited to mp3 files at the moment) where the frequency changes drastically (melody changes and ignoring noises). After that I do the same thing to the input sound, and match the results with the inverted files. The matching one determines the matching song.
For Axel's method, I think you shouldn't worry about the query whether it's a singing or just humming, since you don't implement a speech recognition program. But I'm curious about your method which uses hash functions. Could you explain that to me?
For query by humming feature, it is more complicate than the audio fingerprinting solution, the difficult comes from:
how to efficiently collect the melody database in real world application? many demo system use midi to build-up, but midi solution's cost is extremely not affordable for a company.
how to deal with the time variance, for example, user hum may fast or slow. use DTW? yes, DTW is a very good solution for dealing with time series with time variance, BUT it cost too much CPU-load.
how to make time series index?
Here is an demo query by humming open source project, https://github.com/EmilioMolina/QueryBySingingHumming, could be an reference.