Some problems searching mysql database via php - php

So I have this website that has a search feature which searches a table in my mysql database. The database at the moment has 1108 rows. It contains music info such as Artist and Album. Since its possible for every character to be in an artist name or album name, I've urlencoded each of those variables before being added to the database. See below:
$artist = urlencode($_POST['artist']);
$album = urlencode($_POST['album']);
So now lets pretend that I have added a new entry to the database and it contain characters that needed to be urlencoded. The database shows it fine.
Now I want to go search.
Foreign characters worked. You can see here: http://albumarrrt.net/details.php?artist=Ai%20Otsuka clicking the album link for each one works.
But now a few problems occur.
1 - If you search for '&' the search reads the %26 as nothing. It shows %26 in the address bar, but it reads it as nothing.
Here is how it is being read:
$search = $_GET['search'];
if($search == '') {
echo "Please enter a search term :(";
}
That is the only thing done with $search before it starts getting read by the database.
2 - If you search for a single or double quotes, it does some weird stuff example:
Search for " and get No matches found
for "%5C%5C%26quot%3B" Search for '
and get No matches found for
"%5C%5C%26%23039%3B"
I don't understand why it does this, because the database only contains the code for the quote and nothing else.
Those are the only two things I have found wrong with my search. Maybe I have just been looking at it too long and can't figure it out, but it perplexes my why it doesn't read '&' as anything.
Onto my last question.
My current searching method separates each word and adds %'s around it and then uses the LIKE statement to find matches. example:
Search: A bunch of Stuff (word)
the mysql query would be like:
SELECT * FROM TABLE WHERE (album LIKE '%A%' AND album LIKE '%bunch%' AND album LIKE '%of%' AND album LIKE '%Stuff%' AND album LIKE '%%28word%29%') OR (artist LIKE '%A%' AND artist LIKE '%bunch%' AND artist LIKE '%of%' AND artist LIKE '%Stuff%' AND artist LIKE '%%28word%29%')
Obviously this is putting a lot of strain on the server, and I know using LIKE statements for such large database searching is a bad idea, so what would be an alternate way of searching FULL TEXT or some other method?
Sorry for the overwhelming amount of questions, But they all sorta go hand-in-hand with each other.
edit:
Ok I've fixed my database up, but still have a few questions.
Someone suggested to convert my text from utf8 to plain utf, how would I do this?
and I am still getting the problem with the & sign.
for example:
if you search for & on google it works, however on my site, my POST result for the search query reveals nothing when searching for &.

First: don't urlencode data in the database. Urlencode data after you fetch it, as you output to HTML.
Second: do use query parameters when you use user-supplied values in SQL queries. Then you don't have to worry about quotes in the form data causing syntax errors or SQL injection risks.
Third: don't use the LIKE '%pattern%' hack; instead use a real fulltext search solution instead (either FULLTEXT or Lucene/Solr or Sphinx Search). It'll have performance hundreds or thousands of times better than using ad-hoc text searching (depending on your volume of data).
See the presentation I did for MySQL University: Practical Full Text Search in MySQL.

I don't see why you need to urlencode, I would simply use mysql_real_escape_string.
'&' is a separator in a url so it won't be passed to your script unless you urlencode it first
Another problem with urlencode is the large number of extra characters. mySQL may silently truncate the artist or title if you haven't allowed for enough characters.
DC

are you sure you don't want to be decoding the things coming from your URL's (and POSTS) before placing them in the database? If I were storing various strings, I would want to decode them to plain UTF or something and store them that way. Then I would re-encode them to display them. This might solve your search problem in and of itself.
Second, to speed up strings search access, you could create a strings table with all of your strings tokenized, and linked back to the strings that contain them. Then instead of doing a "like %$1%" you can say where $1 = stringTable.String and join against that ID. By no means count this as the optimal solution as I haven't done those performance tunes myself, it's just a suggestion.

Related

inserting links in mysql database table cell

I have a notification system on my website where users will be notified of a certain event. The message of the notification will be stored in a MySQL database.
When I insert links into the message, it returns a MySQL error saying it cannot insert links.
How can I achieve that? I am using php.
Almost all links will include characters that will be interpreted by a sql database in an odd manner, leading to unpredictable results based on the varied input. You should ensure these characters are properly escaped:
$some_input = getInputFromWebPage();
$safe_input = mysql_real_escape_string($someInput);
insertIntoDB($safe_input);
That is just an example, and obviously not working code, but hopefully it heads you in the right direction. There are several functions that will add escape characters to strings for you.

PHP MySQL Name Score separation system

I've got a website which lists sports scores. It current works like this:
Firstname Lastname 1-0 Firstname Lastname
It explodes this based on spaces, then explodes the third one (containing the scores) based on the - .
The problem with this is that it does not support names with more than 2 words. If I explode using - first, it would not support names with - in there. The results are added in a textarea, because I have many thousands that need to be added, so I don't want to make multiple fields to input data into, as I can currently add matches quickly listing one result per line. Does anyone have advice on how to make the system both multi-word, and special character-insensitive? Is there maybe a way to split when it encounters a number, then select the first chunk as the first name, the last as that players score, and the rest as the last name?
I don't know if there's any way to teach a simple parsing command, or even a regular expression, to do what you want. Some cases will always be ambiguous. For example, if you have the names `Mary Ann Steiner" and "Constantin Van Dyke" the patterns are exactly the same, but one needs to be split (2/1) and the other needs to be split (1/2).
You could possibly find a library that knows how to make educated guesses based on a huge dictionary of known names, but failing that...
I think in this case you need the human brain inputting the data to make some of the decisions, and indicate them during data entry. In my experience using multiple fields isn't that slow if you navigate using the tab key instead of mousing around. You could also enter the data using a delimiter of your own, like:
Mary Ann,Steiner,2-3
Constantin,Van Dyke,4-2
Then you'd run something that explodes those lines based on "," and enters the elements into your db.
If you're copy/pasting or scraping the data from an external site, another option would be to just explode every line using the method you're currently using. This should work for most records, and when it doesn't work, it will be obvious -- the resulting record will have too many elements. You can have your script flag just those records for human intervention.

Do values coming directly from the database need to be escaped?

Do I need to escape/filter data that is coming from the database? Even if said data has already been "escaped" once (at the point in time where it was inserted into the database).
For example, say I allow users to submit blog posts via a form that has a title input and a textarea input.
A malicious user submits the blog post
title: Attackposttitle');DROP TABLE posts;--
textarea: Hahaha nuked ur site noobzors!
Now as this is being inserted into my database, I am going to escape it with mysql_real_escape_string, but once it is in the database I will later reference this data in my php blog application with something like this:
sql="SELECT posttitle FROM posts WHERE id=50";
$posttitlearray = mysql_fetch_array(mysql_query($sql));
This is where my concern is, what if I, for example, run the following query to get the post content:
sql="SELECT postcontent FROM posts WHERE posttitle=$posttitlearray[posttitle]";
In theory am I not sql injecting myself? IE, am I not effectively running the query:
sql="SELECT postcontent FROM posts WHERE posttitle=Attackposttitle');DROP TABLE posts;--";
Or does the "Attackposttitle');DROP TABLE posts;--" data continue to be escaped once it is in the database?
Do I need to continually escape it like so:
sql="SELECT postcontent FROM posts WHERE posttitle=msql_real_escape_string($posttitlearray[posttitle])";
Or is the data safe once it has been escaped initially upon first being inserted into the database?
Thanks Stack!
It does not continue to be escaped once it's put in the database. You'll have to escape it again.
$sql="SELECT postcontent FROM posts WHERE posttitle='".mysql_real_escape_string($posttitlearray[posttitle])."'";
The value should be escaped every time just before insertion to SQL query. Not for magical security reasons, but just to be sure that the syntax of the resultant query is OK.
Escaping the string sound magical to many people, something like shield against some mysterious danger, but in fact it is nothing magical. It is just the way to enable special characters being processed by the query.
The best would be just to have a look what escaping really does. Say the input string is:
Attackposttitle');DROP TABLE posts;--
after escaping:
Attackposttitle\');DROP TABLE posts;--
in fact it escaped only the single slash. That's the only thing you need to assure - that when you insert the string in the query, the syntax will be OK!
insert into posts set title = 'Attackposttitle\');DROP TABLE posts;--'
It's nothing magical like danger shield or something, it is just to ensure that the resultant query has the right syntax! (of course if it doesn't, it can be exploited)
The query parser then looks at the \' sequence and knows that it is still the variable, not ending of its value. It will remove the backslash and the following will be stored in the database:
Attackposttitle');DROP TABLE posts;--
which is exactly the same value as user entered. And which is exactly what you wanted to have in the database!!
So this means that the if you fetch that string from the database and want to use it in the query again, you need to escape it again to be sure that the resultant query has the right syntax.
But, in your example, very important thing to mention is the magic_quotes_gpc directive!
This feature escapes all the user input automatically (gpc - _GET, _POST and _COOKIE). This is an evil feature made for people not aware of sql injection. It is evil for two reasons. First reason is that then you have to distinguish the case of your first and second query - in the first you don't escape and in the second you do. What most people do is to either switch the "feature" off (I prefer this solution) or unescape the user input at first and then escape it again when needed. The unescape code could look like:
function stripslashes_deep($value)
{
return is_array($value) ?
array_map('stripslashes_deep', $value) :
stripslashes($value);
}
if (get_magic_quotes_gpc()) {
$_POST = stripslashes_deep($_POST);
$_GET = stripslashes_deep($_GET);
$_COOKIE = stripslashes_deep($_COOKIE);
}
The second reason why this is evil is because there is nothing like "universal quoting".
When quoting, you always quote text for some particular output, like:
string value for mysql query
like expression for mysql query
html code
json
mysql regular expression
php regular expression
For each case, you need different quoting, because each usage is present within different syntax context. This also implies that the quoting shouldn't be made at the input into PHP, but at the particular output! Which is the reason why features like magic_quotes_gpc are broken (never forget to handle it, or better, assure it is switched off!!!).
So, what methods would one use for quoting in these particular cases? (Feel free to correct me, there might be more modern methods, but these are working for me)
mysql_real_escape_string($str)
mysql_real_escape_string(addcslashes($str, "%_"))
htmlspecialchars($str)
json_encode() - only for utf8! I use my function for iso-8859-2
mysql_real_escape_string(addcslashes($str, '^.[]$()|*+?{}')) - you cannot use preg_quote in this case because backslash would be escaped two times!
preg_quote()
Try using bind variables. which will remove the need to escape your data completely.
http://php.net/manual/en/function.mssql-bind.php
only down side is your restricted to using them with stored procedures in SQL server, other database you can use them for everything.

remove similar characters that appear in all rows

So I have a table with two columns "title" and "url". The rows go as such:
Title url
Galago - Wikipedia http://en.wikipedia.org/wiki/Galago
Characteristics - Wikipedia http://en.wikipedia.org/wiki/Galago
Classification - Wikipedia http://en.wikipedia.org/wiki/Galago
Myst- Gamestop http://www.gamestop.com/ds/games/myst/69424
Plot- Gamestop http://www.gamestop.com/ds/games/myst/69424
my question is, how would I remove the common characters that are present in all rows from a certain url (remove - Wikipedia from the first three, and - Gamestop from the other 2). This is just a minor example....I have many other rows that have the same pattern (they have common characters, words, that reoccur in all of the rows from a certain url). I wanted to add that I store these values from a javacript array
If all of your strings are in the format shown above for the title column, I think the best approach may be to apply a regular expression to the title before inserting into the database table. This regular expression could capture all data preceding the "-" character and discard the "duplicate" data succeeding the "-".
Info on regular expressions on strings in PHP can be found here: http://php.net/manual/en/function.preg-match.php
I think that most automated solutions to this risk removing data that you want to keep. A word or phrase that occurs on more than one row is not necessarily redundant. A couple of potential, but still unreliable, methods come to mind. These would work only if you are looking for whole words.
Read all the titles into an array, and create a wordlist array by splitting each title into words. You can then determine the frequency of each word, and use that information to remove the unwanted words from the titles. If you have a lot of data, this method could use a lot of memory...
Parse each URL, extract the hostname, split it using a period (.) As the delimiter, and then search for and remove occurrences of those strings from the title. You might choose to create a whitelist of strings to ignore, like www, com, co, uk, net, org, and so on. This method may work if the unwanted words are found in the domain name (as in your examples).
You could normalize out the url info into another table...so like take the url column and make it url_id and create a url table that provides a url column and a title column. Title would be like Wikipedia or Gamestop etc. Then in the original table store the title with just the title not including the url title.
Maybe that won't work very well with the queries you are trying to do, but in that way you could probably search by url, url title, or title or any combination of those pretty easily.

How to make MySQL treat underscore as a word separator for fulltext search?

I am using MySQL fulltext and PHP (codeigniter) to search a database containing RSS items. Problem is some of these items's titles use underscores instead of spaces. Since MySQL considers underscores as part of a word, these items will never be matched in the search, unless the user types the exact title including underscores.
Server is shared so I don't have access to MySQL Server System Variables.
Can this behavior be changed in some other way?
Can this maybe be done through the search query itself?
I know I could just replace all underscore occurrences in the DB by spaces, but this would compromise the original integrity of those titles though. Just wondering if there's another way of doing this.
I know I could just replace all underscore occurrences in the DB by spaces, but this would compromise the original integrity of those titles though. Just wondering if there's another way of doing this.
You can instead of replacing underscores in original title field, use a separate field dedicated to fulltext searches.
This allows you to replace underscores, plus aggregates keywords into this field (category names, authors, tags, etc.) to enhance search results relevance.
We used this a lot of times with success for getting rid of HTML tags in content infering with search
I don't think this can be done without access to the server. The only way I have ever seen to do it is the first comment on this mySQL manual page ("How I added '-' to the list of word characters"). It requires stopping the server and changing internal configuration.
Your best bet is probably creating a second column with removed underscores, and to search that.

Categories