I have a text file that looks like this:
<http://dbpedia.org/resource/Autism> <http://www.w3.org/2000/01/rdf-schema#comment> "Autism is a disorder of neural development characterized by impaired social interaction and communication, and by restricted and repetitive behavior. The diagnostic criteria require that symptoms become apparent before a child is three years old. Autism affects information processing in the brain by altering how nerve cells and their synapses connect and organize; how this occurs is not well understood."#en .
<http://dbpedia.org/resource/Anarchism> <http://www.w3.org/2000/01/rdf-schema#comment> "Anarchism is generally defined as the political philosophy which holds the state to be undesirable, unnecessary, and harmful, or alternatively as opposing authority and hierarchical organization in the conduct of human relations. Proponents of anarchism, known as \"anarchists\", advocate stateless societies based on non-hierarchical voluntary associations. There are many types and traditions of anarchism, not all of which are mutually exclusive."#en .
<http://dbpedia.org/resource/Achilles> <http://www.w3.org/2000/01/rdf-schema#comment> "In Greek mythology, Achilles was a Greek hero of the Trojan War, the central character and the greatest warrior of Homer's Iliad. Plato named Achilles the most handsome of the heroes assembled against Troy. Later legends (beginning with a poem by Statius in the 1st century AD) state that Achilles was invulnerable in all of his body except for his heel. As he died because of a small wound on his heel, the term Achilles' heel has come to mean a person's principal weakness."#en .
I'm using code (not relevant here) to extract the name of the article in the first url in each line. Then I extract the first sentence of the description between quotes. The problem is when I try to insert that first sentence string into my table, the insert fails (echoing works fine). Just inserting the title without the description works fine. Does anyone have any idea why the description makes the insert fail?
Here's the code I'm using to get the first sentence:
$data = fgets($handle); //get line
$data = str_replace("> ", "-!-", $data);
dataArr = explode("-!-", $data);
//Get last part of uri from 1st element in array
$title = getLastPartOfUrl($dataArr[0]);
$desc=preg_replace('/(.*?[?!.](?=\s|$)).*/', '\\1', escape(substr($dataArr[2],1)));
$db->query("insert into mytable SET title = '".$title."', desc ='".$desc."'");
function escape($str)
{
$search=array("\\","\0","\n","\r","\x1a","'",'"');
$replace=array("\\\\","\\0","\\n","\\r","\Z","\'",'\"');
return str_replace($search,$replace,$str);
}
EDIT: I tried both urlencode and addslashes to no avail, in both cases including the $desc string makes the insert fail.
You don't escape the title.
I wouldn't trust your escape function either. I'm not sure what $db is, but you should use properly parameterized queries with PDO/mysqli
EDIT: DESC is a reserved word in MySQL. You need to surround it (when used as a column name) with backticks in your query.
Related
im working on PHP + MySQL application, which will crawl HDD/shared drive and index all files and directories into database, to provide "fulltext" search on it. So far im doing well, but im stuck on question, if i chosed good way how to store data into database.
On picture below, you can see part schema of my database. Thought is, that i'm saving domain (which represents part of disk which i wana to index) then there are some link(s) (which represents files and folder (with content, filepath, etc) then i have table to store sole (uniq) keywords, which i find in file/folder name or content.
And finaly, i have 16 tables linkkeyword to store relations between links and keywords. I have 16 of them because i thought it might be good to make something like hashtable, because im expecting high number of relations between link <-> keyword. (so far for 15k links and 400k keywords i have about 2.5milion of linkkeyword records). So to avoid storing so much data into one table (and later search above them) i thought that this hastable can be faster. It works like i wana to search for word, i compute it md5 and look at first character of md5 and then i know to which linkkeyword table i should use. So there is only about 150~200k records in each linkkeyword table (against 2.5milions)
So there im curious, if this approach can be of any use, or if will be better to store all linkkeyword information to single table and mysql will take care of it (and to how much link<->keyword it can work?)
So far this was great solution to me, but i crushed hard when i tried to implement regular-expression search. So user can use e.g. "tem*" which can result in temp, temporary, temple etc... In normal way when searching for word, i will conpute in md5 hash and then i know to which linkkeyword table i need to look. But for regular expression i need to get all keywords from keywords table (which matches regular expression) and then process them one by one.
Im also attaching part of code for normal keyword search
private function searchKeywords($selectedDomains) {
$searchValues = $this->searchValue;
$this->resultData = array();
foreach (explode(" ", $searchValues) as $keywordName) {
$keywordName = strtolower($keywordName);
$keywordMd5 = md5($keywordName);
$selection = $this->database->table('link');
$results = $selection->where('domain.id', $selectedDomains)->where('domain.searchable = ?', '1')->where(':linkkeyword' . $keywordMd5[0] . '.keyword.keyword LIKE ?', $keywordName)
->select('link.*,:linkkeyword' . $keywordMd5[0] . '.weight,:linkkeyword' . $keywordMd5[0] . '.keyword.keyword');
foreach ($results as $result) {
$keyExists = array_key_exists($result->linkId, $this->resultData);
if ($keyExists) {
$this->resultData[$result->linkId]->updateWeight($result->weight);
$this->resultData[$result->linkId]->addKeyword($result->keyword);
} else {
$domain = $result->ref('domain');
$linkClass = new search\linkClass($result, $domain);
$linkClass->updateWeight($result->weight);
$linkClass->addKeyword($result->keyword);
$this->resultData[$result->linkId] = $linkClass;
}
}
}
}
and regular expression search function
private function searchRegexp($selectedDomains) {
//get stored search value
$searchValues = $this->searchValue;
//replace astering and exclamation mark (counted as characters for regular expression) and replace them by their mysql equivalent
$searchValues = str_replace("*", "%", $searchValues);
$searchValues = str_replace("!", "_", $searchValues);
// empty result array to prevent previous results to interfere
$this->resultData = array();
//searched phrase can be multiple keywords, so split it by space and get results for each keyword
foreach (explode(" ", $searchValues) as $keywordName) {
//set default link result weight to -1 (default value)
$weight = -1;
//select all keywords, which match searched keyword (or its regular expression)
$keywords = $this->database->table('keyword')->where('keyword LIKE ?', $keywordName);
foreach ($keywords as $keyword) {
//count keyword md5 sum to determine which table should be use to match it links
$md5 = md5($keyword->keyword);
//get all link ids from linkkeyword relation table
$keywordJoinLink = $keyword->related('linkkeyword' . $md5[0])->where('link.domain.searchable','1');
//loop found links
foreach ($keywordJoinLink as $link) {
//store link weight, for later result sort
$weight = $link->weight;
//get link ID
$linkId = $link->linkId;
//check if link already exists in results, to prevent duplicity
$keyExists = array_key_exists($linkId, $this->resultData);
//if link already exists in result set, just update its weight and insert matching keyword for later keyword tag specification
if ($keyExists) {
$this->resultData[$linkId]->updateWeight($weight);
$this->resultData[$linkId]->addKeyword($keyword->keyword);
//if link isnt in result yet, insert it
} else {
//get link reference
$linkData = $link->ref('link', 'linkId');
//get information about domain, to which link belongs (location, flagPath,...)
$domainData = $linkData->ref('domain', 'domainId');
//if is domain searchable and was selected before search, add link to result set. Otherwise ignore it
if ($domainData->searchable == 1 && in_array($domainData->id, $selectedDomains)) {
//create new link instance
$linkClass = new search\linkClass($linkData, $domainData);
//insert matching keyword to links keyword set
$linkClass->addKeyword($keyword->keyword);
//set links weight
$linkClass->updateWeight($weight);
//insert link into result set
$this->resultData[$linkId] = $linkClass;
}
}
}
}
}
}
Your question is mostly one of opinion, so you may want to include the criteria that allow us to answer "worth it' more objectively.
It appears you've re-invented the concept of database sharding (though without distributing your data across multiple servers).
I assume you are trying to optimize search time; if that's the case, I'd suggest that 2.5 million records on a modern hardware is not a particularly big performance challenge, as long as your queries can use an index. If you can't use an index (e.g. because you're doing a regular expression search), sharding will probably not help at all.
My general recommendation with database performance tuning is to start with the simplest possible relational solution, keep tuning that until it breaks your performance goals, then add more hardware, and only once you've done that should you go for "exotic" solutions like sharding.
This doesn't mean using prayer as a strategy. For performance-critical application, I typically build a test database, where I can experiment with solutions. In your case, I'd build a database with your schema without the "sharding" tables, and then populate it with test data (either write your own population routines, or use a tool like DBMonster). Typically, I'd go for at least double the size I expect in production. You can then run and tune queries to prove, one way or another, whether your schema is good enough. It sounds like a lot of work, but it's much less work than your sharding solution is likely to bring along.
There are (as #danFromGermany comments) solutions that are optimized for text serach, and you could use MySQL fulltext search features rather than regular expressions.
I'm working for the first time with MATCH...AGAINST in php sql but there is one bothering me and I can't figure out how to fix it. This is my code:
SELECT * FROM m_artist WHERE match(artist_name) against('". $_POST['article_content'] ."' IN BOOLEAN MODE)
And this is $_POST['article_content']:
Wildstylez Brothers Yeah Frontliner Waveliner
Now my output should be: Wildstylez, Frontliner and Waveliner cause that's in my database. And I do but besides that I also get the Vodka Brothers, 2 Brothers of Hardstyle and more cause of the word brothers. How do I fix that SQL only selects the literal match?
Full-text search actually is a quite misleading name: you can search the full text by your query (like google does) but it won't guarantee you, that the full text equals your query.
So, according to documentation on Boolean Full-Text Searches your input Wildstylez Brothers Yeah Frontliner Waveliner is interpreted as artist_name contains (at least) one of Wildstylez, Brothers, Yeah, Frontliner and Waveliner as word. This is why you get e.g. the Vodka Brothers, which contains Brothers. For google-like purposes this is just what you want, as you want to get details on something you only know part of as in show me articles on music.
You probably want to use
artist_name LIKE '%name_part1%' OR artist_name LIKE '%name_part2%' ...
or
artist_name IN ('exact_name1', 'exact_name2', ...)
simpliest case would be doing something like
$names = explode(' ', $_POST['article_content']);
$name_searches = array_map(function($a) {return 'artist_name = \''.mysql_real_escape_string($a).'\'';}, $names);
$sql = "SELECT * FROM m_artist WHERE ".implode(" OR ", $name_searches);
but you would loose the ability to find 2 Brothers of Hardstyle as the name itself contains a space.
Another approach can be to prefix all words by '+' and stick to MATCH() AGAINST() and you will find only artists which include every word given.
Please provide more context if this is not what you are looking for.
I am trying to append strings to a field if they do not already exist with:
mysql_query("UPDATE gals4 SET tags = CONCAT(tags, '$instag') WHERE id = '$instagnum' AND tags NOT LIKE '$instag'");
This just appends to 'tags' regardless of weather it exists in the field or not. What am i doing wrong here?
To answer your immediate question, you must put the character % at the beginning and end of the match string:
"AND tags NOT LIKE '%$instag%'"
However, you should be aware that this is a terrible way to store data in an SQL database. There are at least three problems:
If you have tags that embed other tags ("cat" and "scat" for instance) you will find the wrong records unless you write very complicated comma-based searches.
These searches can never be indexed and will therefore become very slow as the number of records grows.
You cannot verify the tags used against a list of allowed tags, guarantee that only allowed tags are in the database, or easily present a list of existing or allowed tags.
The correct solution is to add at least one table to your database, called something like gals_tags, with columns galid and tag. Insert one record per tag into this table. If a gal has more than one tag, add one record for each tag.
You need another variable $instagwildcard which would be $instag but with a % after (and possibly before) $instag
As written, your select is looking for a string that is not an exact match to $instag - and I assume you are looking to also exclude strings that contain $instag somewhere in the string rather than an exact match....
$instagwildcard = "%" . $instag . "%";
mysql_query("UPDATE gals4 SET tags = CONCAT(tags, '$instag') WHERE id = '$instagnum' AND tags NOT LIKE '$instagwildcard'");
But I also agree with Larry's comment.
I'm new to web design, especially backend design so I have a few questions about implementing a search function in PHP. I already set up a MySQL connection but I don't know how to access specific rows in the MySQL table. Also is the similar text function implemented correctly considering I want to return results that are nearly the same as the search term? Right now, I can only return results that are the exact same or it gives "no result." For example, if I search "tex" it would return results containing "text"? I realize that there are a lot of mistakes in my coding and logic, so please help if possible. Event is the name of the row I am trying to access.
$input = $_POST["searchevent"];
while ($events = mysql_fetch_row($Event)) {
$eventname = $events[1];
$eventid = $events[0];
$diff = similar_text($input, $event, $hold)
if ($hold == '100') {
echo $eventname;
break;
else
echo "no result";
}
Thank you.
I've noticed some of the comments mentioned more efficient ways of performing the search than with the "similar text" function, if I were to use the LIKE function, how would it be implemented?
A couple of different ways of doing this:
The faster one (performance wise) is:
select * FROM Table where keyword LIKE '%value%'
The trick in this one is the placement of the % which is a wildcard, saying either search everything that ends or begins with this value.
A more flexible but (slightly) slower one could be the REGEXP function:
Select * FROM Table WHERE keyword REGEXP 'value'
This is using the power of regular expressions, so you could get as elaborate as you wanted with it. However, leaving as above gives you a "poor man's Google" of sorts, allowing the search to be bits and pieces of overall fields.
The sticky part comes in if you're trying to search names. For example, either would find the name "smith" if you searched SMI. However, neither would find "Jon Smith" if there was a first and last name field separated. So, you'd have to do some concatenation for the search to find either Jon OR Smith OR Jon Smith OR Smith, Jon. It can really snowball from there.
Of course, if you're doing some sort of advanced search, you'll have to condition your query accordingly. So, for instance, if you wanted to search first, last, address, then your query would have to test for each:
SELECT * FROM table WHERE first LIKE '%value%' OR last LIKE '%value%' OR address LIKE '%value'
Look at below example :
$word2compare = "stupid";
$words = array(
'stupid',
'stu and pid',
'hello',
'foobar',
'stpid',
'upid',
'stuuupid',
'sstuuupiiid',
);
while(list($id, $str) = each($words)){
similar_text($str, $word2compare, $percent);
if($percent > 90) // Change percentage value to 80,70,60 and see changes
print "Comparing '$word2compare' with '$str': ";
}
You can check with $percent parameter for how strong match you want to apply.
Background:
I have this "with rollup" query defined in MySQL:
SELECT
case TRIM(company)
when 'apple' THEN 'AAPL'
when 'microsoft' THEN 'MSFT'
else '__xx__'
END as company
,case TRIM(division)
when 'hardware' THEN Trim(division)
when 'software' THEN Trim(division)
else '__xx__'
END as division
,concat( '$' , format(sum(trydollar),0)) as dollars
FROM pivtest
GROUP BY
company, division with rollup
And it generates this output:
AAPL;hardware;$279,296
AAPL;software;$293,620
AAPL;__xx__;$572,916
MSFT;hardware;$306,045
MSFT;software;$308,097
MSFT;__xx__;$614,142
__xx__;__xx__;$1,187,058
If you have used "with rollup" queries in MySQL before, you can most likely infer the structure of my source table.
Question:
Given this raw output of MySQL, what is the easiest way to get a "tree" structure like the following?
AAPL
hardware;$279,296
software;$293,620
Total; $572,916
MSFT
hardware;$306,045
software;$308,097
Total;$614,142
Total
$1,187,058
Easiest is to do it in whatever client program you're using to receive and show the user MySQL's output -- definitely not easiest to implement presentation-layer functionality in the data layer!-) So tell us what language &c is in your client program and we may be able to help...
Edit: giving a simple Python client-side solution at the original asker's request.
With Python's DB API, results from a DB query can be most simply seen as a list of tuples. So here's a function to format those results as required:
def formout(results):
marker = dict(__xx__=' Total')
current_stock = None
for stock, kind, cash in results:
if stock != current_stock:
print marker.get(stock, stock).strip()
current_stock = stock
if kind in marker and stock in marker:
kind = ' '*8
print ' %s;%s' % (marker.get(kind, kind), cash)
marker is a dictionary to map the special marker '__xx__' into the desired string in the output (I'm left-padding it appropriately for the "intermediate" totals, so when I print the final "grand total", I .strip() those blanks off). I also use it to check for the special case in which both of the first two columns are the marker (because in that case the second column needs to be turned into spaces instead). Feel free to ask in comments for any further clarification of Python idioms and use that may be necessary!
Here's the output I see when I call this function with the supplied data (turned into a list of 7 tuples of 3 strings each):
AAPL
hardware;$279,296
software;$293,620
Total;$572,916
MSFT
hardware;$306,045
software;$308,097
Total;$614,142
Total
;$1,187,058
The space-alignment is not identical to that I see in the question (which is a little inconsistent in terms of how many spaces are supposed to be where) but I hope it's close enough to what you want to make it easy for you to adjust this to your exact needs (as you're having to translate Python into PHP anyway, the space-adjustment should hopefully be the least of it).