Formatting text with regex - php

I have two functions to format the text of my notices.
1. Converts [white-text][/white-text] into <font color=white></font>
$string = preg_replace("/\[white-text\](\S+?)\[\/white-text\]/si","<font color=white>\\1</font>", $string);
2. Converts [url][/url] into <a href></a>
$string = preg_replace("/\[url\](\S+?)\[\/url\]/si","\\1", $string);
Problems:
WHITE-TEXT - It only changes the color if the phrase has only ONE word.
URL - It works fine, but I would like to be able to write anything in the readable part of the URL.

URL - It works fine, but I would like to be able to write anything in the readable part of the URL.
Make the URL code have the form [url=href]description[/url], you can then use this simple RegExp
"/\[url=([^\]]*)\](.+?)\[\/url\]/si"
"\\2"

Related

How to convert an url into a hyperlink with php urlencode()?

Trying to convert a plain URL text into a valid link.
The problem I have is that my link might contain both English (A-Z/a-z) and Hebrew (אבגדהוזחטיכךלמםנןסעפףצץקרשת) letters.
Using PHP's urlencode() function I was able to get the correct format for Hebrew, yet I cannot find the right way in which I convert it into a link.
My code so far (does not work with Hebrew letters):
$replyText = preg_replace('#(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)#', '$1', $replyText);
An example for a URL I need to convert into a link:
google.co.il%2F%D7%A9%D7%9C%D7%95%D7%9D_Hello.html
Will become:
google.co.il%2F%D7%A9%D7%9C%D7%95%D7%9D_Hello.html
Despite what I believe you have posted to represent the desired output, if this was my task, I think I would have a urlencoded href value in the <a> tag and human-readable link text.
Code: (Demo)
$replyText = "google.co.il%2F%D7%A9%D7%9C%D7%95%D7%9D_Hello.html";
echo '', urldecode($replyText), '';
Source Code Output:
google.co.il/שלום_Hello.html
Effective Output:
google.co.il/שלום_Hello.html
Notice that when you mouseover the link, your browser's status bar will show the un-encoded url anyhow.
You just need to replace %2F => /, so your link will be: google.co.il/%D7%A9%D7%9C%D7%95%D7%9D_Hello.html
link

Regex not preceded by href="

So I am adding [embed][/embed] around youtube links in a WordPress environment, since if you use different fields for content input in the backend than the normale content editor, it won't do this automatically (even if you apply_filter the_content).
So, I found this regex which works perfect for my application:
$firstalinea = preg_replace('/\s*[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)([a-zA-Z0-9\/\*\-\_\?\&\;\%\=\.]*)/i', '[embed]https://www.youtube.com/watch?v=$2[/embed]', $firstalinea);
Except for one thing. If someone places a link to a YouTube-video instead of wanting to embed it, it also replaces and then the link does not work anymore.
Link
So, how to make the regex NOT work, if preceded by href=" ?
Thanks!
Solved it:
$re = '/(?<!href=\")(http:\/\/|https:\/\/)(?:www\.)?youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)([a-zA-Z0-9\/\*\-\_\?\&\;\%\=\.]*)/i';
$firstalinea = preg_replace($re, '[embed]https://www.youtube.com/watch?v=$3[/embed]', $firstalinea);

Slugs for SEO using PHP - Appending name to end of URL

Something I have noticed on the StackOverflow website:
If you visit the URL of a question on StackOverflow.com:
"https://stackoverflow.com/questions/10721603"
The website adds the name of the question to the end of the URL, so it turns into:
"https://stackoverflow.com/questions/10721603/grid-background-image-using-imagebrush"
This is great, I understand that this makes the URL more meaningful and is probably good as a technique for SEO.
What I wanted to Achieve after seeing this Implementation on StackOverflow
I wish to implement the same thing with my website. I am happy using a header() 301 redirect in order to achieve this, but I am attempting to come up with a tight script that will do the trick.
My Code so Far
Please see it working by clicking here
// Set the title of the page article (This could be from the database). Trimming any spaces either side
$original_name = trim(' How to get file creation & modification date/times in Python with-dash?');
// Replace any characters that are not A-Za-z0-9 or a dash with a space
$replace_strange_characters = preg_replace('/[^\da-z-]/i', " ", $original_name);
// Replace any spaces (or multiple spaces) with a single dash to make it URL friendly
$replace_spaces = preg_replace("/([ ]{1,})/", "-", $replace_strange_characters);
// Remove any trailing slashes
$removed_dashes = preg_replace("/^([\-]{0,})|([\-]{2,})|([\-]{0,})$/", "", $replace_spaces);
// Show the finished name on the screen
print_r($removed_dashes);
The Problem
I have created this code and it works fine by the looks of things, it makes the string URL friendly and readable to the human eye. However, it I would like to see if it is possible to simplify or "tightened it up" a bit... as I feel my code is probably over complicated.
It is not so much that I want it put onto one line, because I could do that by nesting the functions into one another, but I feel that there might be an overall simpler way of achieving it - I am looking for ideas.
In summary, the code achieves the following:
Removes any "strange" characters and replaces them with a space
Replaces any spaces with a dash to make it URL friendly
Returns a string without any spaces, with words separated with dashes and has no trailing spaces or dashes
String is readable (Doesn't contain percentage signs and + symbols like simply using urlencode()
Thanks for your help!
Potential Solutions
I found out whilst writing this that article, that I am looking for what is known as a URL 'slug' and they are indeed useful for SEO.
I found this library on Google code which appears to work well in the first instance.
There is also a notable question on this on SO which can be found here, which has other examples.
I tried to play with preg like you did. However it gets more and more complicated when you start looking at foreign languages.
What I ended up doing was simply trimming the title, and using urlencode
$url_slug = urlencode($title);
Also I had to add those:
$title = str_replace('/','',$title); //Apache doesn't like this character even encoded
$title = str_replace('\\','',$title); //Apache doesn't like this character even encoded
There are also 3rd party libraries such as: http://cubiq.org/the-perfect-php-clean-url-generator
Indeed, you can do that:
$original_name = ' How to get file creation & modification date/times in Python with-dash?';
$result = preg_replace('~[^a-z0-9]++~i', '-', $original_name);
$result = trim($result, '-');
To deal with other alphabets you can use this pattern instead:
~\P{Xan}++~u
or
~[^\pL\pN]++~u

Link converting pregmatch working with markdown

function makeLinks($text) {
$text = preg_replace('%(?<!href=")(((f|ht){1}(tp://|tps://))[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1', $text);
$text = preg_replace('%([:space:]()[{}])(www.[-a-zA-Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1\\2', $text);
return $text;
}
It misses if I have something like this: - www.website.org (a hyphen then a space) at the beginning of a line. If I have - www.website.org - www.website.org it catches the second one.
Shouldn't that be covered by the space in the second preg_replace?
I also tried %(\s\n\r(){})
I am running it through markdown, but not till after (markdown(makeLinks($foo))) so I thought that shouldn't interfere, but when I take the markdown off and everything just echos out in one line, it does make links out of them. If i put makeLinks(markdown($foo)) it behaves the same as initially.. not making links out of the ones that begin with www at the beginning of list items.
Thats some pretty dodgy regex work there. Here is a regex I would recommend instead for URL dectection:
%(?<!href="?)(((f|ht)(tp://|tps://))?[a-zA-Z0-9-].[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i
Should be a lot more reliable than the two you have now.

How to use substr at regexp of preg replace

i am making a bbcode for youtube videos.User can post a video as bbcode eg like [youtube]http://www.youtube.com/watch?v=ihK2pPcDSHM[/youtube]. Next, it will convert it to html code.But instead of video,i want to show also the image of the video. So i do it like this:
$string = preg_replace("~\[yt]http://www.youtube.com/watch\?v=(.*)\[/yt]~Uis","<img src=\"http://img.youtube.com/vi/\\1/0.jpg\" />", $string);
It shows the image, but when somebody puts a url like:
http://www.youtube.com/watch?v=ihK2pPcDSHM&feature=channel
Then the image url becomes http://img.youtube.com/vi/ihK2pPcDSHM&feature=channel1/0.jpg which does not lead to a valid image. I am trying to change the \\1 to ".substr('\\1', 0,11)." but it doesnt have any result.
Any suggestion to solve this? Thanks!
Try a different pattern like:
~\[yt]http://www\.youtube\.com/watch\?v=([a-z0-9-_]+).*?\[/yt]~is
Just tell your regex to stop on the first & character:
$string = preg_replace("~\[yt]http://www.youtube.com/watch\?v=([^\\&]*)\[/yt]~Uis","<img src=\"http://img.youtube.com/vi/\\1/0.jpg\" />", $string);

Categories