How to handle special characters in fuzzy search query - php

So my solr query is implemented in two parts,first query does an exact search if there are no results found for exact then it goes to the second query that does a fuzzy search.
every things works fine but in situations like-->A user enters "burg +"
So in exact search no records will come,so second query is called to do a fuzzy search.Now comes the problem my fuzzy query does not understand special characters like +,-* which throws and error.If i dont pass special characters it works fine. But in real world a user can put characters with their search,which will throw an error.
Now iam stuck in this and dont know how to resolve this issue.
This is how my exact search query looks like
$query1="(business_name:$data*^100 OR city_name:$data*^1 OR
locality_name:$data*^6 OR business_search_tag_name:$data*^8 OR
type_name:$data*^7) AND (business_active_flag:1) AND
(business_visible_flag:1) AND (delete_status_businessmasters:0)";
This is how my fuzzy query looks like
$query2='(_query_:%20"{!complexphrase%20qf=business_name^100+type_name^0.4+locality_name^6%27}%20'.$url_new.')AND(business_active_flag:1)AND(business_point:[1.5 TO 2.0])&q.op=AND&wt=json&indent=true';
This is the error iam getting
Cannot parse ' must~1 *~N': '*' or '?' not allowed as first character in WildcardQuery
Iam new to solr and dont know how to tackle this situation.
Details of what iam using
Solrphpclient
php
solr 4.9

ok so i see that you are using solrphpclient.You need to make changes in the service.php file so that these special characters get replaced to either blank or what ever you want.
This will take care of the problem you are facing
$params=str_replace("%", "", $params);
$params=str_replace("*", "", $params);
$params=str_replace("&", "", $params);
you need to put this in the search function or inside you custom function which i assume you are using for the fuzzy query

Related

How to Escape Special Characters in Apache solr in php

I want to escape the special character from this solr query
stringfield:/"name":"Elan"/.
I try this one
stringfield:/\".name.\":\".Elan.\"/
but its not working.Is there any other ways to solve this ?
I'm still not getting your setup, but I guess you do a bit too much escaping. And the query in your question looks kind of odd concerning the addressing of fields.
A filter query should only consist of field:value, not field1:field2:value or something...
As a tip, try to assemble the URL manually and get it working. Or use the Solr Admin UI, where you can assemble your query in a form-based manner. You'll also get the query URL from there.
Have you tried to print the URL you assemble in your PHP code and invoke it manually?
Your query URL should look simply like this:
http://localhost:8983/solr/mycore/select?q=*&fq=myfield:"myvalue"
or URL-escaped:
http://localhost:8983/solr/mycore/select?q=*&fq=myfield%3A%22myvalue%22
I guess, your PHP code should look like this:
$solrq .= '&fq=stringfield:"' . urlencode($_POST['name']) . '"';
where $_POST['name'] is hopefully just Elan.

Display if URL matches database characters

Let me first give you a little background to explain what I'm trying to do. My websites use URL's that look like this: MySite/World/Isthmus_of_Panama
I'm working on a major upgrade (and may eventually upgrade further by switching to a CMS, like Drupal or WordPress), and it sounds like the general consensus is that URL's with hyphens are better than underscores. So I'm changing my URL's to MySite/World/Isthmus-of-Panama. In the meantime, I'm also trying to figure out if I should change my URL's to all lower case, and what about special symbols like accents or parentheses?
And what if someone typed in a URL that looks like MySite/World/Isthmus of Panama ? Wikipedia has a script that automatically converts the spaces to underscores. It will also default to the correct URL if you use the wrong case.
Of course, if I change my URL's, I'll also have to forward visitors from my old URL's. It's getting very confusing.
Then I realized that I could cover all of the bases with a script that accepts any URL that matches the characters in my database, 1) regardless of case, 2) and regardless of whether multiple words are separated by hyphens, underscores, spaces or %20. So imagine the following URL's:
MySite/World/Isthmus-of-Panama
MySite/World/Isthmus of Panama
MySite/World/Isthums%20of%20Panama
MySite/World/isthumus_of_panama
MySite/World/Isthmus-of_PANAMA
Where the database value is Isthmus-of-Panama.
Below is one of my queries, where $MyURL = the database value URL (e.g. Isthmus-of-Panama). Can anyone tell me how to modify it so that all of the above URL's will be accepted, with the page then defaulting to the database value?
Wikipedia has a similar feature. If you go to their article about Crazy Horse, then replace the URL Crazy_Horse with crazy_horse or Crazy Horse, it will default to Crazy_Horse. Thanks.
$sql= "SELECT COUNT(URL) AS num FROM gs_reference
WHERE URL = :MyURL";
$stmt = $pdo->prepare($sql);
$stmt->bindParam(':MyURL',$MyURL,PDO::PARAM_STR);
$stmt->execute();
$Total = $stmt->fetch();
switch($Total['num'])
{
case 1:
// DISPLAY A PAGE
break;
case 0:
// 404 NOT FOUND ERROR
break;
default:
// DUPLICATE RESULTS
break;
}
I would convert input, example Isthums%20of%20Panama, to the database value in php.
If the converted value is equal to the input one then don't do a 301 redirect to the url with the converted text else do one
EDIT
I would create in database a column slug (generally called like this) which contain the text normalized (ascii character and -) and create an unique index on it
You could use this function to generate the slug in php: PHP function to make slug (URL string)

preg_replace limit issue, handling array values

I've been working with the Sphider search engine for an internal website, we need to be able to quickly search for contact details in exported .htm(l) files.
$fulltxt = ereg_replace("[_A-Za-z0-9-]+(\.[_A-Za-z0-9-]+)*#[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,3})", "\\0", $fulltxt);
I am replacing e-mail addresses with a convenient mailto: link so users can open Outlook straight from the search results.
However,
while (preg_match("/[^\>](".$change.")[^\<]/i", " ".$fulltxt." ", $regs)) {
$fulltxt = preg_replace("/".$regs[1]."/i", "<b>".$regs[1]."</b>", $fulltxt);
}
It replaces all matches in the search results with bold tags, which resuts into the tags been included in Outlook's 'To...' field. It looks something like this in HTML (thanks Yuriy):
<b>name</b>.surname#domain
I have tried adding a value to the 'limit' parameter:
while (preg_match("/[^\>](".$change.")[^\<]/i", " ".$fulltxt." ", $regs)) {
$fulltxt = preg_replace("/".$regs[1]."/i", "<b>".$regs[1]."</b>", $fulltxt, 1);
}
Supposingly this should be the solution to my problem by simply replacing only the first occurrence (being the name as the pattern is name-phone num-email and we always search by name), instead it only makes it incredibly slow to the point i get a timeout message from the server. I've been trying various solutions but have been out of luck.
Any ideas? Am i doing something wrong?
Thanks.
(*Original heavily edited).
Did I understand you right that something like this happens?
<b>email#domain</b>
Why don't you put tags into search results first, and only then apply "mailto:" anchors to emails? Added 's would be easy to filter out in the patter on that second step.

Structuring a Gdata Spreadsheet query

I am trying to build a Zend_Gdata_Spreadsheets_ListQuery and I can't find any references that explain what the expected query syntax is and what types of queries you can and cannot perform. The closest I have been able to come to finding anything is the [Google Data APIs Client Library (1.41.1)] (https://developers.google.com/gdata/javadoc/com/google/gdata/client/spreadsheet/ListQuery#ListQuery(java.net.URL)), which describes the function setSpreadsheetQuery as follows:
setSpreadsheetQuery
public void setSpreadsheetQuery(java.lang.String query) Sets the
structured spreadsheet query. Parameters: query - the query such as
"name = 'Sonja' and state = 'Georgia'"
This works just fine if you are looking for a cell whose column title is "name" and that contains the text "Sonja"—and nothing else. I am looking for cells containing "Sonja" as part, of the cell's text. A cell in the "name" column with the value "Sonja the Awesome" for example, would not match the search above. name=Sonya* causes an error and name="Sonya*" returns no results.
So, does anybody know where I can find a rundown of what the expected "structure" for the "structured spreadsheet query" is?
Have you tried using regex ? Also, make sure to escape the regex special characters.
Something like \bSonja\b

Problem reading data from file special characters

My previous question and this question both are related a bit. please have a look at my previous question I did not found any other way to unserialize the data so coming with the string operation
I am able to get the whole content from file but not able to get the specific string from this content.
I want to search a specific string from these content but function stop working when the reach at first special character in the string. If I am searching something found before the special character the works properly.
String operation function of PHP not working properly when the encounter first special character in the string and stop processing immediately, Hence they does not give me the correct output.
Originally they looks like (^#)
:"Mage_Core_Model_Message_Collection":2:{s:12:"^#*^#_messages";a:0:{}s:20:"^#*^#_lastAddedMessage";N;}
but when I did echo they are display as ?
Here is the code what I tried
$file='/var/www/html/products/var/session/sess_ciktos8icvk11grtpkj3u610o3';
$contents=file_get_contents($file);
$contents=htmlspecialchars($contents);
//$contents=htmlentities($contents);
echo $contents;
$restData=strstr($contents,'"id";s:4:"');
echo $restData;
$id=substr($restData,0,strpos($restData,'"'));
echo $id;
I changed the default_charset to iso-8859-1 and also utf-8 but not working with both
Please let me know How I can resolve this.
Thanks.
These characters that you see as ^# are actually null bytes. They don't have any proper display, neither they are meant to be displayed - it's an internal representation of protected properties in the engine. You're not supposed to mess with them.
As for resolving, it'd be nice to know what kind of resolution you seek - what result are you trying to achieve?

Categories