Modify string to seo url but keep Scandinavian letters - php

I am having trouble with understanding how to keep the norwegian letters
"æ ø å" in this preg_replace function i got for modifying forum titles into SEO URLs.
My website is rendered in "iso-8859-1".
How i want it: someurl.com/read=kjøp_og_salg
Currently looks like this: someurl.com/read=kj_p_og_salg
//----- The seo url function ------//
public function make_seo_name($title){
$title = preg_replace('/[\'"]/', '', $title);
$title = preg_replace('/[^a-zA-Z0-9]+/', '_', $title);
$title = strtolower(trim($title, '_'));
return $title;
}
I tried to utf8_encode/decode the $title before and after the preg_replace was done, but didn't work.
Thank you for your time!
EDIT:
Solved, i fixed it with some help from "One Trick Pony". I ended up with this function.
public function make_seo_name($title){
$title = utf8_encode($title);
$title = preg_replace('/[\'"]/', '', $title);
$title = preg_replace('/[^a-zA-Z0-9\ø\å\æ]+/', '_', $title);
$title = strtolower(trim($title, '_'));
return $title;
}
Note: i did NOT need to change my header from "iso-8859-1" to "UTF-8"

The '/[^a-zA-Z0-9]+/' bit is a regular expression that says to match only characters that are not the characters a through z, A through Z, or 0 through 9. The basic syntax is on wikipedia.
preg_replace then replaces such characters with underscores.
You can add the extra characters you want to allow to this list:
$title = preg_replace('/[^a-zA-Z0-9æøå]+/', '_', $title);

Set the document encoding to utf-8 or iso-8859-1 and add the characters to the list like:
<head><meta charset="utf-8" /></head>
and
$title = preg_replace('/[^a-zA-Z0-9æøå]+/', '_', $title);

Related

Regex to change quotation mark conflict

I'm trying to create a PHP regex to filter my content on WordPress. I would like to transform quotation marks " " like that « » with non-breaking space.
I also use Timber (TWIG) filter to achieve this.
The problem is that this filter also changes url tags and image tags.
Example :
My link
<a href=« http://www.example.com »>My link</a>
What could I add in my regex to avoid this? Can I have some help please.
functions.php
public function add_to_twig( $twig ) {
$twig->addExtension( new Twig_Extension_StringLoader() );
$twig->addFilter( new Timber\Twig_Filter( 'changemarks', 'changemarks' ) );
return $twig;
}
function changemarks( $text ) {
$regex = '/"(.*?)"/';
$subst = '« $1 »';
$result = preg_replace($regex, $subst, $text);
return $result;
}
single.twig
{{ post.content|changemarks }}
It's difficult to make regular expression in html my solution is just select the string by make some space before and after " for text
$regex = '/ "(.+?)" /';

How to use a function inside a variable?

What I'm trying to do here is make use of PHP's ability to create and write to files because I have like 350 pages to make all with the same line of code that differs by one number. Much rather do this through code than manually creating 350 pages!
Each file will be (.php) and named after the title of the content it will have which has already been defined. However, as this will be the URL to reach the page, I need to format the title and use the formatted version as the filename.
This is what I've got to start with:
function seoUrl($string) {
//Make lowercase
$string = strtolower($string);
//Clean up multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
//Convert whitespaces and underscore to dash
$string = preg_replace("/[\s_]/", "-", $string);
return $string;
}
I found this function earlier on here and it worked perfectly for making the sitemap for all these pages. The URLs were just like I wanted. However, when I call the same function to do this for each title, I hit a snag. I assume I have the code wrong somewhere so here's a piece of the file creation code:
//Content title to be formatted for the filename
$title1="Capitalized And Spaced Title";
//Formatting
$urlfile1="seoUrl ($title1)";
//Text to be written
$txt1="<?include 'tpl/pages/1.txt'?>";
//And the create/write file code
$createfile1=fopen("$urlfile1.php", "w");
fwrite($createfile1, $txt1);
fclose($createfile1);
The code inserts the $txt values just fine, which is actually where I anticipated having a problem. But my files that are created include the function name and parenthesis, plus the title isn't formatted.
I didn't have this problem on the sitemap page:
$url1="$domainurl/$pathurl/$title1.php";
$url2="$domainurl/$pathurl/$title2.php";
...
seoUrl($url1);
seoUrl($url2);
...
<?echo $url1?><br>
<?echo $url2?><br>
...
I've tried everything I can think of for the past couple hours now. What am I doing wrong here?
Try this i hope this might help you out. it will create file in proper format.
function seoUrl($string) {
//Make lowercase
$string = strtolower($string);
//Clean up multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
//Convert whitespaces and underscore to dash
$string = preg_replace("/[\s_]/", "-", $string);
return $string;
}
$title1 = "Capitalized And Spaced Title";
//Formatting
$urlfile1 = seoUrl($title1);
//Text to be written
$txt1 = "<?include 'tpl/pages/1.txt'?>";
//And the create/write file code
$fileName = "" . $urlfile1 . ".php";
$createfile1 = fopen($fileName, "w");
fwrite($createfile1, $txt1);
fclose($createfile1);

text only from title to make seo url

i am working on code where i upload html code and same code is added as content with top char being title and seo url.
but i had issue with making title as unable to get only plain text from html string to use it as title and seo url
below is my code to get title from html text:
$title = getplaintextintrofromhtml($str,100);
$title = str_replace(PHP_EOL, '', $title);
$title = str_replace(" "," ", $title);
$title = str_replace(str_split('\\/:*?"<>|,+=-'), '', $title);
$title = str_replace("'","", $title);
$title = str_replace("<br>","", $title);
$title = str_replace("\n","", $title);
$title = trim($title);
seo url
$newurltitle=str_replace(" ","-",$title);
and function
function getplaintextintrofromhtml($html, $numchars) {
// Remove the HTML tags
$html = strip_tags($html);
// Convert HTML entities to single characters
$html = html_entity_decode($html, ENT_QUOTES, 'UTF-8');
// Make the string the desired number of characters
// Note that substr is not good as it counts by bytes and not characters
$html = mb_substr($html, 0, $numchars, 'UTF-8');
// Add an elipsis
return $html;
}
even after my above code i get titles with new line , i could not figure out why this happens even thought i am getting plain text but issue like new line still there and i can not use them to make seo url also
You can use the following code to remove newlines, extra spaces, and line feeds:
$title = preg_replace('/\s+/', ' ', $title);

Replace url strings in PHP

I have a string for example : I am a boy
I want to show this on my url for example in this way : index.php?string=I-am-a-boy
My program :
$title = "I am a boy";
$number_wrds = str_word_count($title);
if($number_wrds > 1){
$url = str_replace(' ','-',$title);
}else{
$url = $title;
}
What if I have a string : Destination - Silicon Valley
If I implement the same logic my url will be : index.php?string=Destination---Silicon-Valley
But I want to show only 1 hyphen.
I want to show a hyphen instead of a plus sign..
url_encode() will eventually insert plus symbols.. So it's not helping here.
Now if I use minus symbol then if the actual string is Destination - Silicon Valley, then the url will look like
Destination-Silicon-Valley and not
Destination---Silicon-Valley
Check this stackoverflow question title and the url. You will know what I am saying.
Check this
Use urlencode() to send strings along with an url:
$url = 'http://your.server.com/?string=' . urlencode($string);
In comments you told, that you don't want urlencode, you'll just replace spaces by - characters.
First, you should "just do it", the if conditional and str_word_count() is just overhead. Basically your example should look like this:
$title = "I am a boy";
$url = str_replace(' ','-', $title);
That's it.
Further you told that this would make problems if the original string already contains a -. I would use preg_replace() instead of str_replace() to solve that problem. Like this:
$string = 'Destination - Silicon Valley';
// replace spaces by hyphen and
// group multiple hyphens into a single one
$string = preg_replace('/[ -]+/', '-', $string);
echo $string; // Destination-Silicon-Valley
Use preg_replace instead:
$url = preg_replace('/\s+/', '-', $title);
\s+ means "any whitespace character (\t\r\n\f (space, tab, line feed, newline)).
use urlencode:
<?php
$s = "i am a boy";
echo urlencode($s);
$s = "Destination - Silicon Valley";
echo urlencode($s);
?>
return:
i+am+a+boy
Destination+-+Silicon+Valley
and urldecode:
<?php
$s = "i+am+a+boy";
echo urldecode($s)."\n";
$s = "Destination+-+Silicon Valley";
echo urldecode($s);
?>
return:
i am a boy
Destination - Silicon Valley
just use urlencode() and urldecode(). It’s for sending Data with GET in the URL.

removing strange characters from php string

this is what i have right now
Drawing an RSS feed into the php, the raw xml from the rss feed reads:
Paul’s Confidence
The php that i have so far is this.
$newtitle = $item->title;
$newtitle = utf8_decode($newtitle);
The above returns;
Paul?s Confidence
If i remove the utf_decode, i get this
Paul’s Confidence
When i try a str_replace;
$newtitle = str_replace("”", "", $newtitle);
It doesnt work, i get;
Paul’s Confidence
Any thoughts?
This is my function that always works, regardless of encoding:
function RemoveBS($Str) {
$StrArr = str_split($Str); $NewStr = '';
foreach ($StrArr as $Char) {
$CharNo = ord($Char);
if ($CharNo == 163) { $NewStr .= $Char; continue; } // keep £
if ($CharNo > 31 && $CharNo < 127) {
$NewStr .= $Char;
}
}
return $NewStr;
}
How it works:
echo RemoveBS('Hello õhowå åare youÆ?'); // Hello how are you?
Try this:
$newtitle = html_entity_decode($newtitle, ENT_QUOTES, "UTF-8")
If this is not the solution browse this page http://us2.php.net/manual/en/function.html-entity-decode.php
This will remove all non-ascii characters / special characters from a string.
//Remove from a single line string
$output = "Likening ‘not-critical’ with";
$output = preg_replace('/[^(\x20-\x7F)]*/','', $output);
echo $output;
//Remove from a multi-line string
$output = "Likening ‘not-critical’ with \n Likening ‘not-critical’ with \r Likening ‘not-critical’ with. ' ! -.";
$output = preg_replace('/[^(\x20-\x7F)\x0A\x0D]*/','', $output);
echo $output;
I solved the problem. Seems to be a short fix rather than the larger issue, but it works.
$newtitle = str_replace('’', "'", $newtitle);
I also found this useful snippit that may help others with same problem;
<?
$find[] = '“'; // left side double smart quote
$find[] = 'â€'; // right side double smart quote
$find[] = '‘'; // left side single smart quote
$find[] = '’'; // right side single smart quote
$find[] = '…'; // elipsis
$find[] = '—'; // em dash
$find[] = '–'; // en dash
$replace[] = '"';
$replace[] = '"';
$replace[] = "'";
$replace[] = "'";
$replace[] = "...";
$replace[] = "-";
$replace[] = "-";
$text = str_replace($find, $replace, $text);
?>
Thanks everyone for your time and consideration.
Yeah this is not working for me. What is the workaround for this? – vaichidrewar Mar 12 at 22:29
Add this to the HTML head (or modify if already there):
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
This will encode the funny chars like "“" into UTF-8 so that the str_replace() function will interpret them properly.
Or you can do this:
ini_set('default_charset', 'utf-8');
Is the character encoding setting for your PHP server something other than UTF-8? If so, is there a reason or could it be changed to UTF-8? Though we don't store data in UTF-8 in our database, I've found that setting the webserver's character set to UTF-8 seems to help resolve character set issues.
I'd be interested in hearing others' opinions about this... whether I'm setting myself up for problems by setting webserver to UTF-8 while storing submitted data in Latin1 in our mysql database. I know there was a reason I chose Latin1 for the database but can't recall what it was. Interestingly, our current setup seems to allow for non-UTF-8 character entry and subsequent rendering... it seems that storing in Latin1 doesn't prevent subsequent decoding and display of all UTF-8 characters?
Use the below PHP code to remove
html_entity_decode(mb_convert_encoding(stripslashes($name), "HTML-ENTITIES", 'UTF-8'))
Read up on http://us.php.net/manual/en/function.html-entity-decode.php
That & symbol is a html code so you can easily decode it.
Super simple solution is to have the characters decoded when the page is loaded
Simply copy/paste the following at the beginning of the script
header('Content-Type: text/html; charset=UTF-8');
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_regex_encoding('UTF-8');
Reference: http://php.net/manual/en/function.mb-internal-encoding.php
comment left by webfav at web dot de
Many Strange Character be removed by applying
mysqli_set_charset($con,"utf8");
below the mysql connection code.
but in some circumstances of removing this type strange character like â€
we need to use: $title = ' Stefen Suraj'; $newtitle = preg_replace('/[^(\x20-\x7F)]*/','', $title); echo $newtitle;
Output will be: Stefen Suraj
It does not work
You need to use
$arr1 = str_split($str)
then foreach and
echo($arr1[$k])
This will show you exactly which characters are written into the string.
Please Try this.
$find[] = '/“/' //'“'; // left side double smart quote
$find[] = '/”/' //'â€'; // right side double smart quote
$find[] = '/‘/' //'‘'; // left side single smart quote
$find[] = '/’/' //'’'; // right side single smart quote
$find[] = '/â€&#133/' //'…'; // elipsis
$find[] = '/‖/' //'—'; // em dash
$find[] = '/–/' //'–'; // en dash
$replace[] = '“' // '"';
$replace[] = '”' // '"';
$replace[] = '‘' // "'";
$replace[] = '’' // "'";
$replace[] = '⋯' // "...";
$replace[] = '—' // "-";
$replace[] = '–' // "-";
$text = str_replace($find, $replace, $text);
1.The order of the strings in the $find array is significant.
2.This string "‘" should contain a tilde and look like three characters. If I save the .php file with my Genie editor it gits changed to just two characters "â€".
3.This is a useful reference https://www.i18nqa.com/debug/utf8-debug.html
<?php
$text = "‘’“â€1‘ 2’ 3â€â€œâ€™â€˜ 4’ 5 6 7’ ‘, ’, “, â€â€˜";
echo($text . "<br>");
$find = array("‘", "’", "“", "â€");
$replace = array("‘", "’", "“", "”");
$text = str_replace($find, $replace, $text);
echo($text);
?>
Just one simple solution.
if your string contains these type of strange chars
suppose $text contains some of these then just do as shown bellow:
$mytext=mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8')
and it will work..

Categories