PHP SEO Functions - php

I am having a problem trying to understand functions with variables. Here is my code. I am trying to create friendly urls for a site that reports scams. I created a DB full of bad words to remove from the url if it is preset. If the name in the url contains a link I would like it to look like this: example.com-scam.php or html (whichever is better). However, right now it strips the (.) and it looks like this examplecom. How can I fix this to leave the (.) and add a -scam.php or -scam.html to the end?
functions/seourls.php
/* takes the input, scrubs bad characters */
function generate_seo_link($link, $replace = '-', $remove_words = true, $words_array = array()) {
//make it lowercase, remove punctuation, remove multiple/leading/ending spaces
$return = trim(ereg_replace(' +', ' ', preg_replace('/[^a-zA-Z0-9\s]/', '', strtolower($link))));
//remove words, if not helpful to seo
//i like my defaults list in remove_words(), so I wont pass that array
if($remove_words) { $return = remove_words($return, $replace, $words_array); }
//convert the spaces to whatever the user wants
//usually a dash or underscore..
//...then return the value.
return str_replace(' ', $replace, $return);
}
/* takes an input, scrubs unnecessary words */
function remove_words($link,$replace,$words_array = array(),$unique_words = true)
{
//separate all words based on spaces
$input_array = explode(' ',$link);
//create the return array
$return = array();
//loops through words, remove bad words, keep good ones
foreach($input_array as $word)
{
//if it's a word we should add...
if(!in_array($word,$words_array) && ($unique_words ? !in_array($word,$return) : true))
{
$return[] = $word;
}
}
//return good words separated by dashes
return implode($replace,$return);
}
This is my test.php file:
require_once "dbConnection.php";
$query = "select * from bad_words";
$result = mysql_query($query);
while ($record = mysql_fetch_assoc($result))
{
$words_array[] = $record['word'];
}
$sql = "SELECT * FROM reported_scams WHERE id=".$_GET['id'];
$rs_result = mysql_query($sql);
while ($row = mysql_fetch_array($rs_result)) {
$link = $row['business'];
}
require_once "functions/seourls.php";
echo generate_seo_link($link, '-', true, $words_array);
Any help understanding this would be greatly appreciated :) Also, why am I having to echo the function?

Your first real line of code has the comment:
//make it lowercase, remove punctuation, remove multiple/leading/ending spaces
Periods are punctuation, so they're being removed. Add . to the accepted character set if you want to make an exception.

Alter your regular expression (second line) to allow full stops:
$return = trim(ereg_replace(' +', ' ', preg_replace('/[^a-zA-Z0-9\.\s]/', '', strtolower($link))));
The reason your code needs to be echoed is because you are returning a variable in the function. You can change return in the function to echo/print if you want to print it out as soon as you call the function.

Related

splitting long regex to pieces PHP

I have a very long list of names and I am using preg_replace to match if a name from the list is anywhere in the string. If I test it with few names in the regex it works fine, but having in mind that I have over 5000 names it gives me the error "preg_replace(): Compilation failed: regular expression is too large".
Somehow I cannot figure out how to split the regex into pieces so it becomes smaller (if even possible).
The list with names is created dynamically from a database. Here is my code.
$query_gdpr_names = "select name FROM gdpr_names";
$result_gdpr_names = mysqli_query($connect, $query_gdpr_names);
while ($row_gdpr_names = mysqli_fetch_assoc($result_gdpr_names))
{
$AllNames .= '"/'.$row_gdpr_names['name'].'\b/ui",';
}
$AllNames = rtrim($AllNames, ',');
$AllNames = "[$AllNames]";
$search = preg_replace($AllNames, '****', $search);
The created $AllNames str looks like this (in the example 3 names only)
$AllNames = ["/Lola/ui", "/Monica\b/ui", "/Chris\b/ui"];
And the test string
$search = "I am Lola and my friend name is Chris";
Any help is very appreciated.
Since it appears that you can't easily handle the replacement from PHP using a single regex alternation, one alternative would be to just iterate each name in the result set one by one and make a replacement:
while ($row_gdpr_names = mysqli_fetch_assoc($result_gdpr_names)) {
$name = $row_gdpr_names['name'];
$regex = "/\b" . $name . "\b/ui";
$search = preg_replace($regex, '----', $search);
}
$search = preg_replace("/----/", '****', $search);
This is not the most efficient pattern for doing this. Perhaps there is some way you can limit your result set to avoid a too long single alternation.
Ok, I was debugging a lot. Even isolating everything else but this part of code
$search = "Lola and Chris";
$query_gdpr_names = "select * FROM gdpr_names";
$result_gdpr_names = mysqli_query($connect, $query_gdpr_names);
while ($row_gdpr_names = mysqli_fetch_assoc($result_gdpr_names)) {
$name = $row_gdpr_names['name'];
$regex = "/\b" . $name . "\b/ui";
$search = preg_replace($regex, '****', $search);
}
echo $search;
Still, print inside but not outside the loop.
The problem actually was in the database records. There was a slash in one of the records

PHP preg_replace all text changing

I want to make some changes to the html but I have to follow certain rules.
I have a source code like this;
A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli
I need to convert this into the following;
A beautiful sentence http://test.google.com/, You can reach here http://www.google.com/test-mi or http://test.google.com/aliveli
I tried using str_replace;
$html = str_replace('://www.google.com/test','://test.google.com');
When I use it like this, I get an incorrect result like;
A beautiful sentence http://test.google.com/, You can reach here http://test.google.com/-mi or http://test.google.com/aliveli
Wrong replace: http://test.google.com/-mi
How can I do this with preg_replace?
With regex you can use a word boundary and a lookahead to prevent replacing at -
$pattern = '~://www\.google\.com/test\b(?!-)~';
$html = preg_replace($pattern, "://test.google.com", $html);
Here is a regex demo at regex101 and a php demo at eval.in
Be aware, that you need to escape certain characters by a backslash from it's special meaning to match them literally when using regex.
It seems you're replacing the subdirectory test to subdomain. Your case seems to be too complicated. But I've given my best to apply some logic which may be reliable or may not be unless your string stays with the same structure. But you can give a try with this code:
$html = "A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli";
function set_subdomain_string($html, $subdomain_word) {
$html = explode(' ', $html);
foreach($html as &$value) {
$parse_html = parse_url($value);
if(count($parse_html) > 1) {
$path = preg_replace('/[^0-9a-zA-Z\/-_]/', '', $parse_html['path']);
preg_match('/[^0-9a-zA-Z\/-_]/', $parse_html['path'], $match);
if(preg_match_all('/(test$|test\/)/', $path)) {
$path = preg_replace('/(test$|test\/)/', '', $path);
$host = preg_replace('/www/', 'test', $parse_html['host']);
$parse_html['host'] = $host;
if(!empty($match)) {
$parse_html['path'] = $path . $match[0];
} else {
$parse_html['path'] = $path;
}
unset($parse_html['scheme']);
$url_string = "http://" . implode('', $parse_html);
$value = $url_string;
}
}
unset($value);
}
$html = implode(' ', $html);
return $html;
}
echo "<p>{$html}</p>";
$modified_html = set_subdomain_string($html, 'test');
echo "<p>{$modified_html}</p>";
Hope it helps.
If the sentence is the only case in your problem you don't need to start struggling with preg_replace.
Just change your str_replace() functioin call to the following(with the ',' at the end of search string section):
$html = str_replace('://www.google.com/test,','://test.google.com/,');
This matches the first occurance of desired search parameter, and for the last one in your target sentence, add this(Note the '/' at the end):
$html = str_replace('://www.google.com/test/','://test.google.com/');
update:
Use these two:
$targetStr = preg_replace("/:\/\/www.google.com\/test[\s\/]/", "://test.google.com/", $targetStr);
It will match against all but the ones with comma at the end. For those, use you sould use the following:
$targetStr = preg_replace("/:\/\/www.google.com\/test,/", "://test.google.com/,", $targetStr);

PHP - Search, put letters written in bold

I'm trying to do a search engine where I write in a textbox, for example, "Mi" and it selects and shows "Mike Ross". However it's not working with spaces. I write "Mike" and I get "Mike Ross", but when I write "Mike " I get "Mike Ross" (no bold).
The same is happening with accents.
So I write "Jo" and the result is "João Carlos". If I write "Joa", the result is "João Carlos" (without any bold part). I want to ignore the accents while writing but still display them in the results.
So this is my script after the SELECT:
while($row = $result->fetch_array()) {
$name = $row['name'];
$array = explode(' ',trim($name));
$array_length = count($array);
for ($i=0; $i<$array_length; $i++ ) {
$letters = substr($array[$i], 0, $q_length);
if (strtoupper($letters) == strtoupper($q)) {
$bold_name = '<strong>'.$letters.'</strong>';
$final_name = preg_replace('~'.$letters.'~i', $bold_name, $array[$i], 1);
$array[$i] = $final_name;
}
array[$i] = array[$i]." ";
}
foreach ($array as $t_name) { echo $t_name;
}
Thank you for your help!
if (strtoupper($letters) == strtoupper($q))
This will never evaluate to "true" with spaces since you're removing spaces from the matchable letter set with explode(' ', trim($name), effectively making any value of $q with a space unmatchable to $letters
Here's a quick example that does what I think you're looking for
<?php
$q = "Mike "; // User query
$name = "Mike Ross"; // Database row value
if(stripos($name, $q) !== false) // Case-insensitive match
{
// Case-insensitive replace of match with match enclosed in strong tag
$result = preg_replace("/($q)/i", '<strong>$1</strong>', $name);
print_r($result);
}
// Result is
// <strong>Mike </strong>Ross
From what I can tell (a quick google for "replace accented characters PHP"), you're kind of out of luck with that one. This question provides a quick solution using strtr, and this tip uses a similar method with str_replace.
Unfortunately, these rely on predefined character sets, so incoming accents you haven't prepared for will fail. You may be better off relying on users to enter the special characters when they search, or create a new column with a "searchable" name with the accented characters replaced as best as you can, and return the real name as the "matched" display field.
One more Note
I found another solution that can do most of what you want, except the returned name will not have the accent. It will, however, match the accented value in the DB with a non-accented search. Modified code is:
<?php
$q = "Joa";
$name = "João Carlos";
$searchable_name = replace_accents($name);
if(stripos($searchable_name, $q) !== false)
{
$result = preg_replace("/($q)/i", '<strong>$1</strong>', $searchable_name);
print_r($result);
}
function replace_accents($str) {
$str = htmlentities($str, ENT_COMPAT, "UTF-8");
$str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/','$1',$str);
return html_entity_decode($str);
}

Disallow * in php Search

I want to suppress Searches on a database from users inputting (for example) P*.
http://www.aircrewremembered.com/DeutscheKreuzGoldDatabase/
I can't work out how to add this to the code I already have. I'm guessing using an array in the line $trimmed = str_replace("\"","'",trim($search)); is the answer, replacing the "\"" with the array, but I can't seem to find the correct way of doing this. I can get it to work if I just replace the \ with *, but then I lose the trimming of the "\" character: does this matter?
// Retrieve query variable and pass through regular expression.
// Test for unacceptable characters such as quotes, percent signs, etc.
// Trim out whitespace. If ereg expression not passed, produce warning.
$search = #$_GET['q'];
// check if wrapped in quotes
if ( preg_match( '/^(["\']).*\1$/m', $search ) === 1 ) {
$boolean = FALSE;
}
if ( escape_data($search) ) {
//trim whitespace and additional disallowed characters from the stored variable
$trimmed = str_replace("\"","'",trim($search));
$trimmed = stripslashes(str_ireplace("'","", $trimmed));
$prehighlight = stripslashes($trimmed);
$prehighlight = str_ireplace("\"", "", $prehighlight);
$append = stripslashes(urlencode($trimmed));
} else {
$trimmed = "";
$testquery = FALSE;
}
$display = stripslashes($trimmed);
You already said it yourself, just use arrays as parameters for str_repace:
http://php.net/manual/en/function.str-replace.php
$trimmed = str_replace( array("\"", "*"), array("'", ""), trim($search) );
Every element in the first array will be replaced with the cioresponding element from the second array.
For future validation and sanitation, you might want to read about this function too:
http://php.net/manual/en/function.filter-var.php
use $search=mysql_real_escape_string($search); it will remove all characters from $search which can affect your query.

Improve my function: generate SEO friendly title

I am using this function to generate SEO friendly titles, but I think it can be improved, anyone want to try? It does a few things: cleans common accented letters, check against a "forbidden" array, and check optionally against a database of titles in use.
/**
* Recursive function that generates a unique "this-is-the-title123" string for use in URL.
* Checks optionally against $table and $field and the array $forbidden to make sure it's unique.
* Usage: the resulting string should be saved in the db with the object.
*/
function seo_titleinurl_generate($title, $forbidden = FALSE, $table = FALSE, $field = FALSE)
{
## 1. parse $title
$title = clean($title, "oneline"); // remove tags and such
$title = ereg_replace(" ", "-", $title); // replace spaces by "-"
$title = ereg_replace("á", "a", $title); // replace special chars
$title = ereg_replace("í", "i", $title); // replace special chars
$title = ereg_replace("ó", "o", $title); // replace special chars
$title = ereg_replace("ú", "u", $title); // replace special chars
$title = ereg_replace("ñ", "n", $title); // replace special chars
$title = ereg_replace("Ñ", "n", $title); // replace special chars
$title = strtolower(trim($title)); // lowercase
$title = preg_replace("/([^a-zA-Z0-9_-])/",'',$title); // only keep standard latin letters and numbers, hyphens and dashes
## 2. check against db (optional)
if ($table AND $field)
{
$sql = "SELECT * FROM $table WHERE $field = '" . addslashes($title) . "'";
$res = mysql_debug_query($sql);
if (mysql_num_rows($res) > 0)
{
// already taken. So recursively adjust $title and try again.
$title = append_increasing_number($title);
$title = seo_titleinurl_generate($title, $forbidden, $table, $field);
}
}
## 3. check against $forbidden array
if ($forbidden)
{
while (list ($key, $val) = each($forbidden))
{
// $val is the forbidden string
if ($title == $val)
{
$title = append_increasing_number($title);
$title = seo_titleinurl_generate($title, $forbidden, $table, $field);
}
}
}
return $title;
}
/**
* Function that appends an increasing number to a string, for example "peter" becomes "peter1" and "peter129" becomes "peter130".
* (To improve, this function could be made recursive to deal with numbers over 99999.)
*/
function append_increasing_number($title)
{
##. 1. Find number at end of string.
$last1 = substr($title, strlen($title)-1, 1);
$last2 = substr($title, strlen($title)-2, 2);
$last3 = substr($title, strlen($title)-3, 3);
$last4 = substr($title, strlen($title)-4, 4);
$last5 = substr($title, strlen($title)-5, 5); // up to 5 numbers (ie. 99999)
if (is_numeric($last5))
{
$last5++; // +1
$title = substr($title, 0, strlen($title)-5) . $last5;
} elseif (is_numeric($last4))
{
$last4++; // +1
$title = substr($title, 0, strlen($title)-4) . $last4;
} elseif (is_numeric($last3))
{
$last3++; // +1
$title = substr($title, 0, strlen($title)-3) . $last3;
} elseif (is_numeric($last2))
{
$last2++; // +1
$title = substr($title, 0, strlen($title)-2) . $last2;
} elseif (is_numeric($last1))
{
$last1++; // +1
$title = substr($title, 0, strlen($title)-1) . $last1;
} else
{
$title = $title . "1"; // append '1'
}
return $title;
}
There appears to be a race condition because you're doing a SELECT to see if the title has been used before, then returning it if not (presumably the calling code will then INSERT it into the DB). What if another process does the same thing, but it inserts in between your SELECT and your INSERT? Your insert will fail. You should probably add some guaranteed-unique token to the URL (perhaps a "directory" in the path one level higher than the SEO-friendly name, similar to how StackOverflow does it) to avoid the problem of the SEO-friendly URL needing to be unique at all.
I'd also rewrite the append_increasing_number() function to be more readable... have it programmatically determine how many numbers are on the end and work appropriately, instead of a giant if/else to figure it out. The code will be clearer, simpler, and possibly even faster.
The str_replace suggestions above are excellent. Additionally, you can replace that last function with a single line:
function append_increasing_number($title) {
return preg_replace('#([0-9]+)$#e', '\1+1', $title);
}
You can do even better and remove the query-in-a-loop idea entirely, and do something like
"SELECT MAX($field) + 1 FROM $table WHERE $field LIKE '" . mysql_escape_string(preg_replace('#[0-9]+$#', '', $title)) . "%'";
Running SELECTs in a loop like that is just ugly.
It looks like others have hit most of the significant points (especially regarding incrementing the suffix and executing SQL queries recursively / in a loop), but I still see a couple of big improvements that could be made.
Firstly, don't bother trying to come up with your own diacritics-to-ASCII replacements; you'll never catch them all and better tools exist. In particular, I direct your attention to iconv's "TRANSLIT" feature. You can convert from UTF-8 (or whatever encoding is used for your titles) to plain old 7-bit ASCII as follows:
$title = strtolower(strip(clean($title)));
$title = iconv('UTF-8', 'ASCII//TRANSLIT', $title);
$title = str_replace("'", "", $title);
$title = preg_replace(array("/\W+/", "/^\W+|\W+$/"), array("-", ""), $title);
Note that this also fixes a bug in your original code where the space-to-dash replacement was called before trim() and replaces all runs of non-letter/-number/-underscores with single dashes. For example, " Héllo, world's peoples!" becomes "hello-worlds-peoples". This replaces your entire section 1.
Secondly, your $forbidden loop can be rewritten to be more efficient and to eliminate recursion:
if ($forbidden)
{
while (in_array($title, $forbidden))
{
$title = append_increasing_number($title);
}
}
This replaces section 3.
Following karim79's answer, the first part can be made more readable and easier to maintain like this:
Replace
$title = ereg_replace(" ", "-", $title); // replace spaces by "-"
$title = ereg_replace("á", "a", $title); // replace special chars
$title = ereg_replace("í", "i", $title); // replace special chars
with
$replacements = array(
' ' => '-',
'á' => 'a',
'í' => 'i'
);
$title = str_replace(array_keys($replacements, array_values($replacements), $title);
The last part where append_increasing_number() is used looks bad. You could probably delete the whole function and just do something like
while ($i < 99999){
//check for existance of $title . $i; if doesn't exist - insert!
}
You could lose the:
$title = ereg_replace(" ", "-", $title);
And replace those lines with the faster str_replace():
$title = str_replace(" ", "-", $title);
From the PHP manual page for str_replace():
If you don't need fancy replacing
rules (like regular expressions), you
should always use this function
instead of ereg_replace() or
preg_replace().
EDIT:
I enhanced your append_increasing_number($title) function, it does exactly the same thing, only with no limit on the number of digits at the end (and it's prettier :) :
function append_increasing_number($title)
{
$counter = strlen($title);
while(is_numeric(substr($title, $counter - 1, 1))) {
$counter--;
}
$numberPart = (int) substr($title,$counter,strlen($title) - 1);
$incrementedNumberPart = $numberPart + 1;
return str_replace($numberPart, $incrementedNumberPart, $title);
}
You can also use arrays with str_replace() so you could do
$replace = array(' ', 'á');
$with = array('-', 'a');
The position in the array must correspond.
That should shave a few lines out, and a few millisceonds.
You'll also want to give consideration to all punctuation, it's amazing how often, ifferent sets of `'" quotes and !? etc get into urls. I'd do a preg_replace on \W (not word)
preg_replace('/\w/', '', $title);
That should help you a bit.
Phil

Categories