Alter Magento Index Fulltext Search? - php

I have a unique task that I have been given, and I am in the last leg of it, but this sub-task is proving to be extremely difficult! So you have background: We run a Magento site, and use a custom built SOLR search page. I am using phpSolrClient to parse the Solr XML and return usable result which I then build the search results page from.
The task I have been given is to have an "attribute" in the back end of Magento, lets call that "search_tags". The goal is to be able to insert a tag, and it's weight delimitered by commas:
ie sight^2,hearing^1,smell^3
I would like to edit the code in Magento's fulltext reindex to break apart the string, and insert that tag X of times into the fulltext1_en field. So it would add "sight" twice, "hearing" once, and "smell" three times. This is going to allow us to say, put a blender on the page when someone searches for juicers, even though the term "juicer" or "juice" does not appear in the fulltext1_en string. I have developed the code to pull, split and iterate ... However I am at an stand-still since I don't know what code to edit to include this in my fulltext1_en during the reindex process. If anyone has any experience with editing Magento's Fulltext Reindex, your input would be greatly appreciated! I looked in Indexer.php, but everything in that file is ambiguous at best, so that was no help! Gotta love Magento!

OK for those looking to alter and give "weighted tags" to custom search in Magento using SOLR, I was up all night getting this right, but it works...
First, create a filter in Magento and apply it to all products. I named mine "search_tags".
Next, use the following formula in that filter for a test item:
dude^25,crazyman^25,wierdsearch^25
Each word followed by a carat, and then the weight you want to give it. (This is how many times the word will be repeated and then added to fulltext1_en.)
After that is done, open the following file:
/app/code/core/Mage/CatalogSearch/Model/Mysql4/Fulltext.php
I know it says MySQL4, pay no attention, SOLR uses this index.
About line 500, you will see the following block:
if ($selects) {
$select = '('.join(')UNION(', $selects).')';
$query = $this->_getWriteAdapter()->query($select);
while ($row = $query->fetch()) {
JUST BELOW this block insert the following:
NOTE: Do not use the attribute ID I have listed here, that is unique to my setup. You are going to have to search your database to find this ID. I used JOIN to join eav_attributes with catalog_product_entity_varchar and used SELECT to find attribut_id and value WHERE entity_id = (Insert your product ID here). It's a pain, but it's the only way. This will return all the attributes for that product. Look for the one that has the tags we entered in earlier, and get it's ID. Insert that into the code below.
$attr_val = $row['value']; // Set attr_val so that it can be manipulated in following IF
if ($row['attribute_id'] == 457) { // 457 is the ID of MY search_tags filter, yours WILL be different! It can be found by joining eav_attributes table and catalog_product_entity_varchar and searching for the attribute value and ID where entity_id is X
$input = $row['value']; // Set $input to value of filter
$attr_val = ""; // Create Emtpy string
$pieces = explode( ',', $input ); // Explode filter by comma
foreach ($pieces as $val){
$i=1;
$val = explode( '^', $val); // Explode each "tag" by carat
while ($i <= $val[1]) { // Loop while $i is less than or equal to the number on the right side of the carat
$i++;
$attr_val = $attr_val . " " . $val[0]; // Append $attr_val with the word to the right side of the carat
}
}
}
$result[$row['entity_id']][$row['attribute_id']] = $attr_val; // Modified from Original
After you insert that ... Then comment out the following block.
$result[$row['entity_id']][$row['attribute_id']] = $row['value']; // ORIGINAL BLOCK -- UNCOMMENT -- DO NOT DELETE
Now run a fulltext reindex, and your fulltext1_en should show that you've added "dude", "crazyman", and "weirdsearch" all 25 times! When the index is completed, search for any of the tags in your site search: That item you added the tags to should show up close to, if not the top. Enjoy!

Related

Ordering and Selecting frequently used tags

I have looked on stackoverflow for a solution to this however couldn't find a good answers which outlined the issues I was having; Essentially what I'm trying to achieve is to array out 15 of the most frequent tags used from all my users subjects.
This is how I currently select the data
$sql = mysql_query("SELECT subject FROM `users`");
$row = mysql_fetch_array($sql);
I do apologise for the code looking nothing like what I'm trying to achieve I really don't have any clue where to begin with trying to achieve this and came here for a possible solution. Now this would work fine and I'd be able to array them out and however my problem is the subjects contain words along with the hash tags so an example room subject would look like hey my name is example #follow me how would I only grab the #followand once I've grabbed all the hashtags from all of the subjects to echo the most frequent 15?
Again I apologise for the code looking nothing like what I'm trying to achieve and I appreciate anyone's help. This was the closest post I found to solving my issue however was not useful.
Example
Here is three room subjects;
`Hello welcome to my room #awesome #wishlist`
`Hey hows everyone doing? #friday #awesome`
`Check out my #wishlist looking #awesome`
This is what I'm trying to view them as
[3] #awesome [2] #wishlist [1] #friday
What you want to achieve here is pretty complex for an SQL query and you are likely to run in to efficiency problems with parsing the subject every time you want to run this code.
The best solution is probably to have a table that associates tags with users. You can update this table every time a user changes their subject. To get the number of times a tag is used then becomes trivial with COUNT(DISTINCT tag).
One way would be to parse the result set in PHP. Once you query your subject line from the database, let's say you have them in the array $results, then you can build a frequency distribution of words like this:
$freqDist = [];
foreach($results as $row)
{
$words = explode(" ", $row);
foreach($words as $w)
{
if (array_key_exists($w, $freqDist))
$freqDist[$w]++;
else
$freqDist[$w] = 1;
}
}
You can then sort in descending order and display the distribution of words like this:
arsort($freqDist);
foreach($freqDist as $word => $count)
{
if (strpos($word, '#') !== FALSE)
echo "$word: $count\n";
else
echo "$word: does not contain hashtag, DROPPED\n";
}
You could also use preg_match() to do fancier matching if you want but I've used a naive approach with strpos() to assume that if the word has '#' (anywhere) it's a hashtag.
Other functions of possible use to you:
str_word_count(): Return information about words used in a string.
array_count_values(): Counts all the values of an array.

WordPress PHP htmlspecialchars(get_field... cannot read arrays?

I am working on a WordPress Website/Blog with two main functions.
Create reports.
Compile final report.
People can write reports, selecting the fields they need and publish it. Then at the end of the day, they can "compile" a final report from all of the reports (it concatenate the fields of all reports).
The theme is twentyten (in case it might be useful).
In my function.php file, I concatenate everthing for the final report using a foreach and lines like that:
$Urgences_Environnementales .= htmlspecialchars("<br/>".get_field('Urgences_Environnementales', $idnumber->ID));
$avezvous_regardé_des_indices_de_temps_violent_aujourdhui .= htmlspecialchars(get_field('avezvous_regardé_des_indices_de_temps_violent_aujourdhui', $idnumber->ID));
$quelle_est_cette_raison .= htmlspecialchars(get_field('quelle_est_cette_raison', $idnumber->ID));
One line per field, all the same way. After the loop is done, I update the fields:
update_field('Urgences_Environnementales',preg_replace('/(<br[\s]?[\/]?>[\s]*){2,}/', '<br/><br/>', htmlspecialchars_decode($Urgences_Environnementales)), $identificationRapport);
update_field('avezvous_regardé_des_indices_de_temps_violent_aujourdhui',preg_replace('/(<br[\s]?[\/]?>[\s]*){2,}/', '<br/><br/>', htmlspecialchars_decode($avezvous_regardé_des_indices_de_temps_violent_aujourdhui)), $identificationRapport);
update_field('quelle_est_cette_raison',preg_replace('/(<br[\s]?[\/]?>[\s]*){2,}/', '<br/><br/>', htmlspecialchars_decode($quelle_est_cette_raison)), $identificationRapport);
Then it's printed for the final report like this (this is a single field):
if(strip_tags(html_entity_decode(get_field('Urgences_Environnementales')))!=''){
simplebox(strip_tags(html_entity_decode(get_field('Urgences_Environnementales')))!='', get_field('Urgences_Environnementales'));
}
And for those fields it works perfectly.
My problem is that all my fields composed of arrays (checkboxes that people can select multiple choices using the ACF plugin) are empty in my databse... They appear perfectly in the single reports, but they appear blank in the final report.
As an exemple, this is what I see in my database for a single report for one of my arrays:
a:4:{i:0;s:49:"L’indice d’intensité d’orage violent (STI)";i:1;s:35:"L’indice d’orage violent (TMPV)";i:2;s:34:"Potential Severe Storm Environment";i:3;s:6:"Autres";}
The corresponding field in my final report is empty.
Would someone have an idea on how to read those arrays and record them correctly in my databse? Could I transform them in strings in my foreach loop? Should I do something differently?
If you need more code don't hesitate to ask. I didn't put all my 3 functions (functions.php, report.php, finalreport.php) that I have in my WordPress theme as it would take tons of lines and I'm pretty sure the most important ones are right here. If I'm wrong, I could post the functions.
I searched and searched, but I can't seem to find the answer by myself, so I'm searching for help here.
PS: This is my 1st post, if you have any reccomandations, you can send them to me and I will change my post.
Thank you very much for your help!
PPS: I'm sorry for my english, I'm french, from Montreal, Qc, Canada.
Advanced Custom Fields stores some values as serialized arrays (checkboxes, repeaters, etc). Your code is assuming that you will be getting a string back. As you suggested in your answer, the easiest way to account for this in your current code would be to use the is_array() method to check the type of the returned value, and then another inner loop to handle the summary. This code assumes you just want to concatenate all the values, you could just as easily use another array to make sure they are unique, etc.
// get the value from acf
$value = get_field( 'Urgences_Environnementales', $idnumber->ID );
// if it's already an array, use that, if not make it into an array with a single element
$value_arr = ( is_array( $value ) )? $value : array( $value );
$text = ""; // reset since this is in a loop
// concatenate each checkbox value
foreach ( $value_arr as $val ){
$text .= $val . ', ';
}
// append it to the main summary
$Urgences_Environnementales .= htmlspecialchars( "<br/>". $text );

Best way to store and retrieve a list of URLs

I'm putting together an extremely simple ad server with php and mysql on a website. The site currently has 4 or 5 static pages with lots of dynamic content, so URLs look like this:
index.php?pid=1 or content.php?spec=2
What I'd like to do is add a field to my table of ads to keep track of that page(s) the ad is going to be displayed on.
Should I store URLs that have an ad as a list of comma separated values?
Once I retrieve this variable, what's the best way to separate the values into an array?
What's the best way to split a string so I can split the page name $_GET name and variable (as in 'index, pid, 1' or 'content, spec, 2' using the examples above.) ??
Additional Info:
As an example, doctors.php is structured something like this:
doctors.php Listing of Doctor Specialties
doctors.php?spec=# Listing of Doctors that have a particular
specialty
doctors.php?pid=# One specific Doctor's information
I have a few dozen specialties, and a few hundred doctors. I want to be able to place an ad on specific pages/URLs, say doctors.php?pid=7
But I also want to be able to place an ad based on, say, all of the doctors who have a specialty with the ID of 6. That could be 60+ pages, so it doesn't really make sense to have separate table rows. If I needed to change the link on the ad, I don't want to have to change it 60 times or search for it.
Don't store as a CSV.
Add separate database rows for each ad / URL combination.
Then retrieving content will be trivial.
Store URls one per line in simple file. Than you can use php function "file" to read this file as an array.
For url split use http://php.net/parse_url function
Here's what I think will work...
My ad table will have three variables, (a, b, c)
If I have an ad placed on doctors.php?spec=12, I'll store the following:
a = 'doctors';
b = 'spec';
c = '12';
If the ad was meant to display on ALL specialty pages, I'd store:
a = 'doctors';
b = 'spec';
c = NULL;
If something is NULL, it will simply indicate ALL of a set. It seems like an elegant solution, I'll post code if it works.
$url = "index.php?pid=1";
$pos = strpos($url, "?");
if ($pos != false) {
$pieces1 = explode("?", $url);
$pieces2 = explode("=", $pieces1[1]);
$array = array($pieces1[0], $pieces2[0], $pieces2[1]);
}else{
$array = array($url, '', '');
}
print_r($array);
You'll need to improve this code if you want it to work for multiple variables in your url for ex: index.php?v=hello&t=world
Also, i did not test this code. I just wrote it without checking the functions etc.

Magento/PHP - Modifying the MySQL Queries Underlying the Front-End Product Search Box

Currently, the MySQL database queries that supply the results for the product search field on the front end seem to use "OR" linking criteria in the WHERE clause of the queries.
The reason I assume it is using "OR" is because if you search for something like "green and red plaid shirt", you will get every product with "red" (including "bored", "stored", etc), every product with "green", every product with "plaid", and every product with "shirt".
Now if I can just find out where in the code the queries are being constructed, I should be able to change that to "AND" and end up with queries like this:
SELECT `product_id` FROM `products` WHERE `search_index` LIKE '%red%' AND `search_index` LIKE '%green%' AND `search_index` LIKE '%plaid%' AND `search_index` LIKE '%shirt%';
I haven't been able to find any information by searching Google or Magento's forums. I've been poking around app/code/core/Mage/CatalogSearch/ but have not found the mother lode yet. I know that there is probably some Zend interface I should mess with but haven't found it yet.
Thanks in advance
UPDATE
The below answer does not seem to work for Magento 1.7+, since they've changed some of the search code. I'm working on a solution for that and will update later.
I'm going to answer my own question. Thanks, Anton S for the clues there but I located some key files myself and was able to implement the changes I wanted.
Here is the key file:
app/code/core/Mage/CatalogSearch/Model/Mysql4/Fulltext.php
You would copy the core structure that leads to that file into the local structure, and copy the core file there as well, like so:
app/code/local/Mage/CatalogSearch/Model/Mysql4/Fulltext.php
Then make all changes to the local file, leaving the core file alone.
Look for this bit of code around line 315, inside the function prepareResult($object, $queryText, $query):
foreach($words as $word) {
$like[ ] = '`s`.`data_index` LIKE :likew' . $likeI;
$bind[':likew' . $likeI] = '%' . $word . '%';
$likeI ++;
}
if ($like) {
$likeCond = '(' . join(' OR ', $like). ')';
}
That ' OR ' there is what was giving me thousands of useless results. For example, a search for "green and red plaid shirt" would end up showing me all things green, red, and/or plaid (including shirts, skirts, blimps, rabbits), as well as every single shirt in the store. What the user really wants to find is a product that contains ALL search terms. As noted above, you would also find results like "bored" and "stored" because they contain "red."
To solve most of the problem, you simply have to change that ' OR ' to an ' AND '. Also note that the change only applies to "LIKE" type searches, not "FULLTEXT" type. FULLTEXT doesn't work well in Magento because it excludes way too many results. The method outlined below is much better.
To make the changes:
save the file with the change above.
go into the admin, to System->Catalog->Catalog Search and make sure Search Type is "Like".
Save the configuration
In the admin, go to system->Index Management, and check the box next to Catalog Search Index and reindex it (or just reindex all). (OR from the command line in the magento root type:
php shell/indexer.php --reindex catalogsearch_fulltext
)
If you also want to exclude words like "bored" when searching for "red", then you might want to implement a 2nd change in the same file.
There is another section of code inside that reads:
$bind[':likew' . $likeI] = '%' . $word . '%';
The % at the front means that "bored" is like %red%. But you can't just remove the 1st % to get the right effect because of the way the index is constructed. So instead you make these two changes:
change the above line of code to:
$bind[':likew' . $likeI] = '% ' . $word . '%';
Note the SPACE after the first % before the closing quote. This will now only find words that start with $word (e.g. red, redding, reddy, rediculous all match '% red%'), but you also have to ensure that all words will have spaces before them.
Near the top of the file, under class Mage_CatalogSearch_Model_Mysql4_Fulltext, around line 48 you should find this:
protected $_separator = '|';
I just changed it to this:
protected $_separator = ' | ';
Putting spaces on both sides of the pipe. When you reindex, there will now be spaces before and after every word. A search for "kit" will still give you results for "kitchen", but at least you won't get results for "skit".
Finally, one last change I made was to ensure plural searches return the same results as singular searches, at least for plurals ending in 's'.
I added a line of code where indicated:
foreach($words as $word) {
$word = rtrim($word, 's'); //this line added
$like[ ] = '`s`.`data_index` LIKE :likew' . $likeI;
$bind[':likew' . $likeI] = '%' . $word . '%';
$likeI ++;
}
It simply chops the 's' off the end of every word, so now "red and green plaid shirts" returns the same results as "reds ands greens plaids shirt".
My next project may be to make some more changes to the string parsing to get better results for multi-word searches. I'm looking at this file, fyi: app/code/core/Mage/Core/Helper/String.php
function splitWords
,which is used in the Fulltext.php file for string parsing.
NOTE: To make this change upgrade-safe, you would duplicate the folder structure past app/code/core inside app/code/local, like this:
app/code/local/Mage/CatalogSearch/Model/Mysql4/Fulltext.php
Just copy the core file there, and make your changes there.
You can configure search query's from system > configuration > catalog > catalog search and choose the type of your query's
Search code itself is located under app/code/core/Mage/CataloSearch folder in Mage_CatalogSearch_Model_Query and in Mage_CatalogSearch_Model_Mysql4_Search_Collection class
I am using Magento CE 1.6.0.0 and I found the file in Resource folder rather then the MySql4 folder. Hope this helps.

PHP: Formatting irregular CSV files into HTML tables

My client receives a set of CSV text files periodically, where the elements in each row follow a consistent order and format, but the commas that separate them are inconsistent. Sometimes one comma will separate two elements and other times it will be two or four commas, etc ...
The PHP application I am writing attempts to do the following things:
PSEUDO-CODE:
1. Upload csv.txt file from client's local directory.
2. Create new HTML table.
3. Insert the first three fields FROM csv.txt into HTML table row.
4. Iterate STEP 2 while the FIRST field equals the First field below it.
5. If they do not equal, CLOSE HTML table.
6. Check to see if FIRST field is NOT NULL, IF TRUE, GOTO step 2, Else close HTML table.
I have no trouble with steps 1 and 2. Step 3 is where it gets tricky since the fields in the csv.txt files are not always separated by the same number of commas. They are, however, always in the same relative order and format. I am also having issues with step 4. I don't know how to check if the beginning field in a row matches the beginning field in the row below it. Steps 5 should be relatively simple. For step 6, I need to find an eqivalent of a "GOTO" function in PHP.
Please let me know if any part of the question is unclear. I appreciate your help.
Thank you in advance!
If you want to group the rows by their first element you can try something like:
read the next row via fgetcsv()
filter empty elements (a,,b,c -> a,b,c)
if the row contains fields <-> is not empty append the row to "its" group
That's not exactly what you've described but it may be what you want ;-)
<?php
$fp = fopen('test.csv', 'rb') or die('!fopen');
$groups = array();
while(!feof($fp)) {
$row = array_filter(fgetcsv($fp));
if ( !empty($row) ) {
// # because I don't care whether the array exists or not
#$groups[$row[0]][] = $row;
}
}
foreach( $groups as $g ) {
echo '
<table>';
foreach( $g as $row ) {
echo '
<tr>
<td>', join('</td><td>', array_map('htmlentities', $row)), '</td>
</tr>
';
}
echo '</table>';
}
why not simply start by going through any replacing any multiples of commas with a single comma. eg:
abc,def,,ghi,,,,jkl
becomes:
abc,def,ghi,jkl
and then just continue normally.
If you mean that there are different numbers of commas on each line, then as far as I can see it is actually impossible to do what you want to do by looking at the commas alone. For example:
ab,c,d,ef // could group columns a-f in that way, but
a,bc,de,f // could also group columns a-f
... and you would have no way of knowing which was the proper arrangement, unless you're given some other instructions or the type of data is identifiable by regular expression as someone else said.
If on the other hand you just mean that sometimes there are blanks, but there are still the same number of columns, like this:
a,b,,d,e,f
a,,c,d,e,f
... then you can still form the table correctly. I would recommend using explode(',' $line) in that case and then doing your processing on the elements of the exploded array without worrying about what is inside them.

Categories