php/regex: "linkify" blog titles - php

I'm trying to write a simple PHP function that can take a string like
Topic: Some stuff, Maybe some more, it's my stuff?
and return
topic-some-stuff-maybe-some-more-its-my-stuff
As such:
lowercase
remove all non-alphanumeric non-space characters
replace all spaces (or groups of spaces) with hyphens
Can I do this with a single regex?

function Slug($string)
{
return strtolower(trim(preg_replace('~[^0-9a-z]+~i', '-', html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8')), '-'));
}
$topic = 'Iñtërnâtiônàlizætiøn';
echo Slug($topic); // internationalizaetion
$topic = 'Topic: Some stuff, Maybe some more, it\'s my stuff?';
echo Slug($topic); // topic-some-stuff-maybe-some-more-it-s-my-stuff
$topic = 'here عربي‎ Arabi';
echo Slug($topic); // here-arabi
$topic = 'here 日本語 Japanese';
echo Slug($topic); // here-japanese

Many frameworks provide functions for this
CodeIgniter:
http://bitbucket.org/ellislab/codeigniter/src/c39315f13a76/system/helpers/url_helper.php#cl-472
wordpress (has many more in the code):
http://core.trac.wordpress.org/browser/trunk/wp-includes/formatting.php#L814

You can do it with one preg_replace:
preg_replace(array("/[A-Z]/e", "/\\p{P}/", "/\\s+/"),
array('strtolower("$0")', '', '-'), $str);
Technically, you could do it with one regex, but this is simpler.
Preemptive response: yes, it unnecessarily uses regular expressions (though very simple ones), an unecessarily big number of calls to strtolower, and it doesn't consider non-english characters (he doesn't even give an encoding); I'm just satisfying the OP's requirements.

Why are regular expressions considered the universal panacea to all life's problems (just because a lowly backtrace in a preg_match has discovered the cure for cancer). here's a solution without recourse to regexp:
$str = "Topic: Some stuff, Maybe some more, it's my stuff?";
$str = implode('-',str_word_count(strtolower($str),2));
echo $str;
Without going the whole UTF-8 route:
$str = "Topic: Some stuff, Maybe some more, it's my Iñtërnâtiônàlizætiøn stuff?";
$str = implode('-',str_word_count(strtolower(str_replace("'","",$str)),2,'Þßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'));
echo $str;
gives
topic-some-stuff-maybe-some-more-its-my-iñtërnâtiônàlizætiøn-stuff

Related

PHP Preg_Replace REGEX BB-Code

So I have created this function in PHP to output text in the required form. It is a simple BB-Code system. I have cut out the other BB-Codes from it to keep it shorter (Around 15 cut out)
My issue is the final one [title=blue]Test[/title] (Test data) does not work. It outputs exactly the same. I have tried 4-5 different versions of the REGEX code and nothing has changed it.
Does anyone know where I am going wrong or how to fix it?
function bbcode_format($str){
$str = htmlentities($str);
$format_search = array(
'#\[b\](.*?)\[/b\]#is',
'#\[title=(.*?)\](.*?)\[/title\]#i'
);
$format_replace = array(
'<strong>$1</strong>',
'<div class="box_header" id="$1"><center>$2</center></div>'
);
$str = preg_replace($format_search, $format_replace, $str);
$str = nl2br($str);
return $str;
}
Change the delimiter # to /. And change "/[/b\]" to "\[\/b\]". You need to escape the "/" since you need it as literal character.
Maybe the "array()" should use brackets: "array[]".
Note: I borrowed the answer from here: Convert BBcode to HTML using JavaScript/jQuery
Edit: I forgot that "/" isn't a metacharacter so I edited the answer accordingly.
Update: I wasn't able to make it work with function, but this one works. See the comments. (I used the fiddle on the accepted answer for testing from the question I linked above. You may do so also.) Please note that this is JavaScript. You had PHP code in your question. (I can't help you with PHP code at least for awhile.)
$str = 'this is a [b]bolded[/b], [title=xyz xyz]Title of something[/title]';
//doesn't work (PHP function)
//$str = htmlentities($str);
//notes: lose the single quotes
//lose the text "array" and use brackets
//don't know what "ig" means but doesn't work without them
$format_search = [
/\[b\](.*?)\[\/b\]/ig,
/\[title=(.*?)\](.*?)\[\/title\]/ig
];
$format_replace = [
'<strong>$1</strong>',
'<div class="box_header" id="$1"><center>$2</center></div>'
];
// Perform the actual conversion
for (var i =0;i<$format_search.length;i++) {
$str = $str.replace($format_search[i], $format_replace[i]);
}
//place the formatted string somewhere
document.getElementById('output_area').innerHTML=$str;
​
Update2: Now with PHP... (Sorry, you have to format the $replacements to your liking. I just added some tags and text to demostrate the changes.) If there's still trouble with the "title", see what kind of text you are trying to format. I made the title "=" optional with ? so it should work properly work texts like: "[title=id with one or more words]Title with id[/title]" and "[title]Title without id[/title]. Not sure thought if the id attribute is allowed to have spaces, I guess not: http://reference.sitepoint.com/html/core-attributes/id.
$str = '[title=title id]Title text[/title] No style, [b]Bold[/b], [i]emphasis[/i], no style.';
//try without this if there's trouble
$str = htmlentities($str);
//"#" works as delimiter in PHP (not sure abut JS) so no need to escape the "/" with a "\"
$patterns = array();
$patterns = array(
'#\[b\](.*?)\[/b\]#',
'#\[i\](.*?)\[/i\]#', //delete this row if you don't neet emphasis style
'#\[title=?(.*?)\](.*?)\[/title\]#'
);
$replacements = array();
$replacements = array(
'<strong>$1</strong>',
'<em>$1</em>', // delete this row if you don't need emphasis style
'<h1 id="$1">$2</h1>'
);
//perform the conversion
$str = preg_replace($patterns, $replacements, $str);
echo $str;

regex for breadcrumb in php

I am currently building breadcrumb. It works for example for
http://localhost/researchportal/proposal/
<?php
$url_comp = explode('/',substr($url,1,-1));
$end = count($url_comp);
print_r($url_comp);
foreach($url_comp as $breadcrumb) {
$landing="http://localhost/";
$surl .= $breadcrumb.'/';
if(--$end)
echo '
<a href='.$landing.''.$surl.'>'.$breadcrumb.'</a>»';
else
echo '
'.$breadcrumb.'';
};?>
But when I typed in http://localhost////researchportal////proposal//////////
All the formatting was gone as it confuses my code.
I need to have the site path in an array like ([1]->researchportal, [2]->proposal)
regardless of how many slashes I put.
So can $url_comp = explode('/',substr($url,1,-1)); be turned into a regular expression to get my desired output?
You don't need regex. Look at htmlentities() and stripslashes() in the PHP manual. A regex will return a boolean value of whatever it says, and won't really help you achieve what you are trying to do. All the regex can let you do is say if the string matches the regex do something. If you put in a regex requiring at least 2 characters between each slash, then any time anyone puts more than one consecutive slash in there, the if statement will stop.
http://ca3.php.net/manual/en/function.stripslashes.php
http://ca3.php.net/manual/en/function.htmlentities.php
Found this on the php manual.
It uses simple str_replace statements, modifying this should achieve exactly what your post was asking.
<?
function stripslashes2($string) {
$string = str_replace("\\\"", "\"", $string);
$string = str_replace("\\'", "'", $string);
$string = str_replace("\\\\", "\\", $string);
return $string;
}
?>

preg_replace need help with expression

This is my code:
$string = '« PreviousNext »';
$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
$string = preg_replace('#(<a).*?(nextlink)#s', '', $string);
echo $string;
I am trying to remove the last link:
Next »';
My current output:
">Next »</a>
It removes everything from the start.
I want it to remove only the one with strpos, is this possible with preg_replace and how?
Thanks.
quite a tricky question to solve
first off,
the .*? will not match like you are expecting it to.
its starts from the left finds the first match for <a, then searches until it finds nextlink, which is essentially picking up the entire string.
for that regex to work as you wanted, it would need to match from the righthand side first and work backwards through the string, finding the smallest (non-greedy) match
i couldn't see any modifiers that would do this
so i opted for a callback on each link, that will check and remove any link with nextlink in it
<?php
$string = '« PreviousNext »';
echo "RAW: $string\r\n\r\n";
$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
echo "SRC: $string\r\n\r\n";
$string = preg_replace_callback(
'#&lt\;a.+?</a>#',
'remove_nextlink',
$string
);
function remove_nextlink($matches) {
// if you want to see each line as it works, uncomment this
// echo "L: $matches[0]\r\n\r\n";
if (strpos($matches[0], 'nextlink') === FALSE) {
return $matches[0]; // doesn't contain nextlink, put original string back
} else {
return ''; // contains nextlink, replace with blank
}
}
echo "PROCESSED: $string\r\n\r\n";
Note: This is not a direct answer, but a suggestion to another approach.
I was told once; if you can do it in any other way, stay away from regex. I don't though, it's my white whale. Have you heard of phpQuery? It's jQuery implemented in PHP and very powerful. It would be able to do what you want in a very easy way. I know it's not regex, but perhaps it's of use to you.
If you really want to go ahead, I can recommend http://gskinner.com/RegExr/ . I think it's a great tool.

Turn String With Spaces and Characters Into URL-ready Address (like Wordpress, etc) Using PHP

I've built a custom CMS that does the usual things: post management, content management, contact management, etc.
In the post management section, I would like to extract the "Title" field and convert this into a URL-ready form.
Example: New post is created titled "3 Ways to Win in Real Estate & in Life". I want this to run through a PHP script that turns it into "3_ways_to_win_in_real_estate_&_in_life".
Anyone have a script for this, or would url_encode() do all of this for me?
Make use of currently developed code that you can use within your own projects.
Kohana 3 framework has solution for you. Below you can find solution on the basis of URL::title() method from Kohana 3 framework:
function title($title, $separator = '-') {
// Remove all characters that are not the separator, letters, numbers, or whitespace
$title = preg_replace('![^' . preg_quote($separator) . '\pL\pN\s]+!u', '', strtolower($title));
// Replace all separator characters and whitespace by a single separator
$title = preg_replace('![' . preg_quote($separator) . '\s]+!u', $separator, $title);
// Trim separators from the beginning and end
return trim($title, $separator);
}
function cleanURL($string)
{
$url = str_replace("'", '', $string);
$url = str_replace('%20', ' ', $url);
$url = preg_replace('~[^\\pL0-9_]+~u', '-', $url); // substitutes anything but letters, numbers and '_' with separator
$url = trim($url, "-");
$url = iconv("utf-8", "us-ascii//TRANSLIT", $url); // you may opt for your own custom character map for encoding.
$url = strtolower($url);
$url = preg_replace('~[^-a-z0-9_]+~', '', $url); // keep only letters, numbers, '_' and separator
return $url;
}
// echo cleanURL("Shelly's%20Greatest%20Poem%20(2008)"); // shellys-greatest-poem-2008
from here. You can write your own or possibly find one to replace things like & with and, and so on.
Also note that this function uses dashes, not underscores. The preferred way to create clean URLs is with dashes, not underscores.
This is basic, but works.
static public function slugify($text)
{
// replace all non letters or digits by -
$text = preg_replace('/\W+/', '-', $text);
// trim and lowercase
$text = strtolower(trim($text, '-'));
return $text;
}
From here:
http://www.symfony-project.org/jobeet/1_4/Doctrine/en/05
"Anyone have a script for this, or would url_encode() do all of this for me?"
Have you tried using url_encode() to do this for you? A quick test script would have revealed that much for you, or even using functions-online.com's urlencode() tester.
$str = '3 Ways to Win in Real Estate & in Life';
echo urlencode( $str );
// 3+Ways+to+Win+in+Real+Estate+%26+in+Life
You could use a simple preg_replace() and simple replace anything which is not a letter or digit with either an underline or a dash.
echo preg_replace( '/[^\d\w]+/' , '_' , $str );
// 3_Ways_to_Win_in_Real_Estate_in_Life
echo preg_replace( '/[^\d\w]+/' , '-' , $str );
// 3-Ways-to-Win-in-Real-Estate-in-Life
Just use a dash of str_replace to turn the spaces into underscores, and a sprinkle of urlencode to catch the rest.
Edit: I missed the strtolower part, but I think you had a handle on that.
This is of course just a basic way to go about it, if you want to exactly imitate the wordpress way of turning a text into a URL, have a look at that code, it's open and available for you to do so.

Does anyone have a PHP snippet of code for grabbing the first "sentence" in a string?

If I have a description like:
"We prefer questions that can be answered, not just discussed. Provide details. Write clearly and simply."
And all I want is:
"We prefer questions that can be answered, not just discussed."
I figure I would search for a regular expression, like "[.!\?]", determine the strpos and then do a substr from the main string, but I imagine it's a common thing to do, so hoping someone has a snippet lying around.
A slightly more costly expression, however will be more adaptable if you wish to select multiple types of punctuation as sentence terminators.
$sentence = preg_replace('/([^?!.]*.).*/', '\\1', $string);
Find termination characters followed by a space
$sentence = preg_replace('/(.*?[?!.](?=\s|$)).*/', '\\1', $string);
<?php
$text = "We prefer questions that can be answered, not just discussed. Provide details. Write clearly and simply.";
$array = explode('.',$text);
$text = $array[0];
?>
My previous regex seemed to work in the tester but not in actual PHP. I have edited this answer to provide full, working PHP code, and an improved regex.
$string = 'A simple test!';
var_dump(get_first_sentence($string));
$string = 'A simple test without a character to end the sentence';
var_dump(get_first_sentence($string));
$string = '... But what about me?';
var_dump(get_first_sentence($string));
$string = 'We at StackOverflow.com prefer prices below US$ 7.50. Really, we do.';
var_dump(get_first_sentence($string));
$string = 'This will probably break after this pause .... or won\'t it?';
var_dump(get_first_sentence($string));
function get_first_sentence($string) {
$array = preg_split('/(^.*\w+.*[\.\?!][\s])/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
// You might want to count() but I chose not to, just add
return trim($array[0] . $array[1]);
}
Try this:
$content = "My name is Younas. I live on the pakistan. My email is **fromyounas#gmail.com** and skype name is "**fromyounas**". I loved to work in **IOS development** and website development . ";
$dot = ".";
//find first dot position
$position = stripos ($content, $dot);
//if there's a dot in our soruce text do
if($position) {
//prepare offset
$offset = $position + 1;
//find second dot using offset
$position2 = stripos ($content, $dot, $offset);
$result = substr($content, 0, $position2);
//add a dot
echo $result . '.';
}
Output is:
My name is Younas. I live on the pakistan.
current(explode(".",$input));
I'd probably use any of the multitudes of substring/string-split functions in PHP (some mentioned here already).
But also look for ". " OR ".\n" (and possibly ".\n\r") instead of just ".". Just in case for whatever reason, the sentence contains a period that isn't followed by a space. I think it will harden the likelihood of you getting genuine results.
Example, searching for just "." on:
"I like stackoverflow.com."
Will get you:
"I like stackoverflow."
When really, I'm sure you'd prefer:
"I like stackoverflow.com."
And once you have that basic search, you'll probably come across one or two occasions where it may miss something. Tune as you run with it!
Try this:
reset(explode('.', $s, 2));

Categories