I am trying to figure out a way to detect whether a similar input is entered into mysql database before or not.
I am not saying duplicate entry no similar but not exact, the thing is that when data entry staff need to enter a name the pronunciation of the name might be entered differently from one to another so I need a way so that my php code detects whether an entry similar to the one being entered is already input and warn the staff to double check if it is the same name or not
This can not be done with mysql alone but needs to be handled by your code. So before inserting any data into a table, a query needs to be formulated and executed to find potential similar data. The definition what counts as similar depends strongly on your business needs.
Fuzzy search is not the strength of MySQL by you might get away with one or several like conditions. Searches for variations to a Levenshtein distance of two — maybe three — might be done by calculating possible variations with wildcard placeholders and combine them into one query. This approach is described in more detail by Gordon Lesti. Depending on the complexity of those queries and the traffic on a system, this has grate potential to bring a mysql instance down quite easily.
If performance is an issue, a search engine like Elasticsearch might get helpful. When inserting data into MySQL, this data would also be added as a document to Elasticsearch. This would allow to search for similar records by using the fuzzy search capabilities of Elasticsearch which is by far more performant than MySQL.
If the transactional safety of MySQL is not a requirement for the application, one might also opt to replace MySQL with Elasticsearch and use Elasticsearch not only for searching but also for persistency.
Related
I am working on a project where I have a database, that contains a summary field, which is filled in by a web form that visitors to the site enter on.
When the user completes entering the summary field, I want to perform a lookup using the words that were entered by the the user on the page for similar records in the database that contain the same keywords that they've filled in on the page.
I was thinking I could split the summary string that is submitted and then loop through the array and build up a query so the query would end up something like:
SELECT *
FROM my_table
WHERE summary LIKE '%keyword1%'
OR summary LIKE '%keyword2'
OR summary LIKE '%keyword3%';
However, this seems massively inefficient, and as the database could grow quite big, could potentially become quite a slow query to run.
I then found the MySQL IN clause, but this only seems to work with multiple values where a field can only contain 1 of these values in a row.
Is there a way I can use the IN function, or is there a better MySQL function that I can use to do what I want, or is my first idea the only way round it?
An example of what I am trying to achieve is a bit like on Stack Overflow. When you lose focus of the title field, it pops up similar questions based on the title you've provided.
I would recommend reading this manual page InnoDB FULLTEXT Indexes and the one on Full-Text Restrictions. New functionality of full text has been incorporated in recent releases of mysql, augmenting the use of it with INNODB tables.
Concerning the inability to upgrade a mysql version, there is no reason why one cannot mix and match MyISAM and INNODB tables in the same db. As such, one would keep textual information in MyISAM (where historically FTS index power was available), and doing joins to INNODB tables when needed. This avoids the "must upgrade to version 5.6" argument.
Legend: FTS=Full Text Search
So, the context is: I have a site in which many pages may need the information about one table, say for instance, 'films'. This table has many fields, like title, language, year, description, director... And perhaps in one page I need only the title and the id of some rows and in another I also need the description.
So the question is: should I code a database manager (I am using MySQL) that retrieves all the fields of the rows that satisfy a condition (I guess the WHERE clause should be passed as a parameter)? Or should I be able to specify which fields are needed? I thinks this cannot be done easily with mysqli (because prepared statements require to specify beforehand the number of fetched fields), so for this to work I would need to use PDO instead, which I haven't used yet. Is it worth it this last approach? Or there is not really a big difference in performance if I retrieve the whole information about those rows?
Thank you in advance.
Based upon the comments above, My answer to your question(s) is
Retrieving some fields vs all fields isn't a real performance consideration until you are dealing with one or more CLOB/TEXT columns which have a lot of text in them. Good database practice indicates you should always specify which fields are returned from a query.
Any query against any table should have a where clause to restrict the number of rows returned. Especially if you are looking to query exactly one row.
Your question implies you are writing a wrapper layer around the queries to hide this complexity. Don't do this. Get an existing PHP library that does this work for you. See for example: Good PHP ORM Library? . There are a number of subtle issues, like security, which you will overlook.
I have just been tasked with recovering/rebuilding an extremely large and complex website that had no backups and was fully lost. I have a complete (hopefully) copy of all the PHP files however I have absolutely no clue what the database structure looked like (other than it is certainly at least 50 or so tables...so fairly complex). All data has been lost and the original developer was fired about a year ago in a fiery feud (so I am told). I have been a PHP developer for quite a while and am plenty comfortable trying to sort through everything and get the application/site back up and running...but the lack of a database will be a huge struggle. So...is there any way to simulate a MySQL connection to some software that will capture all incoming queries and attempt to use the requested field and table names to rebuild the structure?
It seems to me that if i start clicking through the application and it passes a query for
SELECT name, email, phone from contact_table WHERE
contact_id='1'
...there should be a way to capture that info and assume there was a table called "contact_table" that had at least 4 fields with those names... If I can do that repetitively, each time adding some sample data to the discovered fields and then moving on to another page, then eventually I should have a rough copy of most of the database structure (at least all public-facing parts). This would be MUCH easier than manually reading all the code and pulling out every reference, reading all the joins and subqueries, and sorting through it all manually.
Anyone ever tried this before? Any other ideas for reverse-engineering the database structure from PHP code?
mysql> SET GLOBAL general_log=1;
With this configuration enabled, the MySQL server writes every query to a log file (datadir/hostname.log by default), even those queries that have errors because the tables and columns don't exist yet.
http://dev.mysql.com/doc/refman/5.6/en/query-log.html says:
The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
As you click around in the application, it should generate SQL queries, and you can have a terminal window open running tail -f on the general query log. As you see queries run by that reference tables or columns that don't exist yet, create those tables and columns. Then repeat clicking around in the app.
A number of things may make this task even harder:
If the queries use SELECT *, you can't infer the names of columns or even how many columns there are. You'll have to inspect the application code to see what column names are used after the query result is returned.
If INSERT statements omit the list of column names, you can't know what columns there are or how many. On the other hand, if INSERT statements do specify a list of column names, you can't know if there are more columns that were intended to take on their default values.
Data types of columns won't be apparent from their names, nor string lengths, nor character sets, nor default values.
Constraints, indexes, primary keys, foreign keys won't be apparent from the queries.
Some tables may exist (for example, lookup tables), even though they are never mentioned by name by the queries you find in the app.
Speaking of lookup tables, many databases have sets of initial values stored in tables, such as all possible user types and so on. Without the knowledge of the data for such lookup tables, it'll be hard or impossible to get the app working.
There may have been triggers and stored procedures. Procedures may be referenced by CALL statements in the app, but you can't guess what the code inside triggers or stored procedures was intended to be.
This project is bound to be very laborious, time-consuming, and involve a lot of guesswork. The fact that the employer had a big feud with the developer might be a warning flag. Be careful to set the expectations so the employer understands it will take a lot of work to do this.
PS: I'm assuming you are using a recent version of MySQL, such as 5.1 or later. If you use MySQL 5.0 or earlier, you should just add log=1 to your /etc/my.cnf and restart mysqld.
Crazy task. Is the code such that the DB queries are at all abstracted? Could you replace the query functions with something which would log the tables, columns and keys, and/or actually create the tables or alter them as needed, before firing off the real query?
Alternatively, it might be easier to do some text processing, regex matching, grep/sort/uniq on the queries in all of the PHP files. The goal would be to get it down to a manageable list of all tables and columns in those tables.
I once had a similar task, fortunately I was able to find an old backup.
If you could find a way to extract the queries, like say, regex match all of the occurrences of mysql_query or whatever extension was used to query the database, you could then use something like php-sql-parser to parse the queries and hopefully from that you would be able to get a list of most tables and columns. However, that is only half the battle. The other half is determining the data types for every single column and that would be rather impossible to do autmatically from PHP. It would basically require you inspect it line by line. There are best practices, but who's to say that the old dev followed them? Determining whether a column called "date" should be stored in DATE, DATETIME, INT, or VARCHAR(50) with some sort of manual ugly string thing can only be determined by looking at the actual code.
Good luck!
You could build some triggers with the BEFORE action time, but unfortunately this will only work for INSERT, UPDATE, or DELETE commands.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html
I'm programming a search engine for my website in PHP, SQL and JQuery. I have experience in adding autocomplete with existing data in the database (i.e. searching article titles). But what about if I want to use the most common search queries that the users type, something similar to the one Google has, without having so much users to contribute to the creation of the data (most common queries)? Is there some kind of open-source SQL table with autocomplete data in it or something similar?
As of now use the static data that you have for auto complete.
Create another table in your database to store the actual user queries. The schema of the table can be <queryID, query, count> where count is incremented each time same query is supplied by some other user [Kind of Rank]. N-Gram Index (so that you could also auto-complete something like "Manchester United" when person just types "United", i.e. not just with the starting string) the queries and simply return the top N after sorting using count.
The above table will gradually keep on improving as and when your user base starts increasing.
One more thing, the Algorithm for accomplishing your task is pretty simple. However the real challenge lies in returning the data to be displayed in fraction of seconds. So when your query database/store size increases then you can use a search engine like Solr/Sphinx to search for you which will be pretty fast in returning back the results to be rendered.
You can use Lucene Search Engiine for this functionality.Refer this link
or you may also give look to Lucene Solr Autocomplete...
Google has (and having) thousands of entries which are arranged according to (day, time, geolocation, language....) and it is increasing by the entries of users, whenever user types a word the system checks the table of "mostly used words belonged to that location+day+time" + (if no answer) then "general words". So for that you should categorize every word entered by users, or make general word-relation table of you database, where the most suitable searched answer will be referenced to.
Yesterday I stumbled on something that answered my question. Google draws autocomplete suggestions from this XML file, so it is wise to use it if you have little users to create your own database with keywords:
http://google.com/complete/search?q=[keyword]&output=toolbar
Just replacing [keyword] with some word will give suggestions about that word then the taks is just to parse the returned xml and format the output to suit your needs.
I've grown quite fond of jsfiddle and how easy it is to use.
Does anyone know of something that works with mysql and maybe php mixed in?
You might be interested in my site: http://sqlfiddle.com. I've built it only recently, but it does support a decent range of database types (including MySQL) and has gotten a fair amount of use lately here on StackOverflow (see the mention on the sql wiki). You can build indexes, and views, and do nearly anything you would normally want to do within a database. Be sure to check out some of the sample fiddles, or see how various other SO users are using it:
MySQL query - optimized -> http://sqlfiddle.com/#!2/1fde2/39
Multilevel Users in the Database table -> http://sqlfiddle.com/#!2/0de1f/7
How to compare a value with a csv value in mysql? -> http://sqlfiddle.com/#!2/b642c/4
I guess I should mention that one other potential useful feature for SO would be that each query displays its execution plan, so if multiple people submitted answers to a sql question, you could easily evaluate their efficiency and then upvote/accept accordingly.
Try SQLize.
It has some annoying limitations, like the inability to create views, but overall I find it very useful. (Tip: CREATE INDEX also doesn't work, but you can still create indexes inside CREATE TABLE.)