about use regex to convert url to link

about use regex to convert url to link - php

I need to convert the url in the article to the 3g domain.
for example, i need to convert
here is the link:http://www.mydomain.com/index thanks
to
here is the link:<a href='http://3g.mydomain.com$4' target='_self'>http://3g.$3.com$4</a> thanks
don't convert the other domain, just mydomain. here is the code:
$c = "/([^'\"=])?http:\/\/([^ ]+?)(mydomain)\.com([A-Za-z0-9&%\?=\/\-\._#]*)/";
$b=preg_replace($c, "$1<a href='http://3g.$3.com$4' target='_self'>http://3g.$3.com$4</a>",$b);
it works very well,but if the text like this:
a link
it will return the wrong result like this:
a link
but l need the result of
a link
how should i do?

You should do the following:
Strip target attributes from existing hyperlinks
Rewrite hyperlinks in href attributes
Rewrite any other hyperlinks
$plain = "http://([^ ]+?)(mydomain)\.com(/?[^'\"\s]*(?=['\"\s]))";
$plain_replace = "http://3g.$3.com$4";
$in_href = "href=(['\"])" + plain + "(['\"])";
$in_href_replace = "href='http://3g.$3.com$4' target='self'";
$strip_target = "target=['\"][^'\"]*['\"]";
...
So:
Replace $strip_target with ""
Replace $in_href with $in_href_replace
Replace $plain with $plain_replace
(The regexes are tested to work in C#, you might have to adjust the \ escaping to suit the php regex rules.)

Get rid of the first ? in your regular expression. That allows for the absence of a preceding character.
Or, perhaps more to your intention, if you want to allow URLs at the beginning, you can replace:
([^'\"=])?
with:
(^|[^'\"=])
...which will allow a link if at the very beginning, or if not preceded by a quote, etc., but not otherwise.

Related

how to filter link url http:// and www with preg replace

i made some function using preg_replace. the code is to get the preview of it.
i made this code
$strings = htmlspecialchars_uni($thread['soc_instagram']);
$searchs = array('~(?:https://instagram\.com/p/)?([a-zA-Z0-9_\-+?:]+)~');
$replaces = array('https://instagram.com/p/$1/media/?size=l');
$soc_instagram = preg_replace($searchs,$replaces,$strings);
the code work perfect if i post an instagram with this url https://instagram.com/p/BarUcqwht_u
and it will produce code
https://instagram.com/p/BaQsAubg6H3/media/?size=l
but the problem is when i try to add WWW in the url, something like this https://www.instagram.com/p/BarUcqwht_u the code will produce error string
the result will be like this
https://instagram.com/p/https:/media/?size=l//https://instagram.com/p/www/media/?size=l.https://instagram.com/p/instagram/media/?size=l.https://instagram.com/p/com/media/?size=l/https://instagram.com/p/p/media/?size=l/https://instagram.com/p/BarUcqwht_u/media/?size=l/
i try to add WWW in my preg_replace code but the result will be like this
https://www.instagram.com/p/https:/media/?size=l//https://www.instagram.com/p/instagram/media/?size=l.https://www.instagram.com/p/com/media/?size=l/https://www.instagram.com/p/p/media/?size=l/https://www.instagram.com/p/BaQsAubg6H3/media/?size=l
any help will be nice, thanks

Add the www. after the protocol and make it optional. preg_replace also doesn't require arrays, strings work fine.
$strings = 'https://www.instagram.com/p/BarUcqwht_u';
$searchs = '~(?:https://(?:www\.)?instagram\.com/p/)?([a-zA-Z0-9_\-+?:]+)~';
$replaces = 'https://instagram.com/p/$1/media/?size=l';
$soc_instagram = preg_replace($searchs,$replaces,$strings);
echo $soc_instagram;
Demo: https://3v4l.org/WSors
What your current implementation does is replaces characters not listed in your character class with the URL. See https://regex101.com/r/PXb2h1/2/ for a visual representation.

converting url sperators with slash

I have a category named like this:
$name = 'Construction / Real Estate';
Those are two different categories, and I am displaying results from database
for each of them. But I before that I have to send a user to url just for that category.
Here is the problem, if I did something like this.
echo "<a href='site.com/category/{$name}'> $name </a>";
The URL will become
site.com/cateogry/Construction%20/%20Real%20Estate
I am trying to remove the %20 and make them / So, I did str_replace('%20', '/', $name);
But that will become something like this:
site.com/cateogry/Construction///Real/Estate
^ ^ and ^ those are the problems.
Since it is one word, I want it to appear as Construction/RealEstate only.
I could do this by using at-least 10 lines of codes, but I was hoping if there is a regex, and simple php way to fix it.

You have a string for human consumption, and based on that string you want to create a URL.
To avoid any characters messing up your HTML, or get abuses as XSS attack, you need to escape the human readable string in the context of HTML using htmlspecialchars():
$name = 'Construction / Real Estate';
echo "<h1>".htmlspecialchars($name)."</h1>;
If that name should go into a URL, it must also be escaped:
$url = "site.com/category/".rawurlencode($name);
If any URL should go into HTML, it must be escaped for HTML:
echo "<a href='".htmlspecialchars($url)."'>";
Now the problem with slashes in URLs is that they are most likely not accepted as a regular character even if they are escaped in the URL. And any space character also does not fit into a URL nicely, although they work.
And then there is that black magic of search engine optimization.
For whatever reason, you should convert your category string before you inject it as part of the URL. Do that BEFORE you encode it.
As a general rule, lowercase characters are better, spaces should be dashes instead, and the slash probably should be a dash too:
$urlname = strtr(mb_strtolower($name), array(" " => "-", "/" => "-"));
And then again:
$url = "site.com/category/".rawurlencode($urlname);
echo "<a href='".htmlspecialchars($url)."'>";
In fact, using htmlspecialchars() is not really enough. The escaping of output that goes into an HTML attribute differs from output as the elements content. If you have a look at the escaper class from Zend Framework 2, you realize that the whole thing of escaping a HTML attribute value is a lot more complicated
No, there is nothing you can do to make it easier. The only chance is to use a function that does everything you need to make things easier for you, but you still need to apply the correct escaping everywhere.

You can use a simple solution like this:
$s = "site.com/cateogry/Construction%20/%20Real%20Estate";
$s = str_replace('%20', '', $s);
echo $s; // site.com/cateogry/Construction/RealEstate

Perhaps, you want to use urldecode() and remove the whitespace afterwards?

Slugs for SEO using PHP - Appending name to end of URL

Something I have noticed on the StackOverflow website:
If you visit the URL of a question on StackOverflow.com:
"https://stackoverflow.com/questions/10721603"
The website adds the name of the question to the end of the URL, so it turns into:
"https://stackoverflow.com/questions/10721603/grid-background-image-using-imagebrush"
This is great, I understand that this makes the URL more meaningful and is probably good as a technique for SEO.
What I wanted to Achieve after seeing this Implementation on StackOverflow
I wish to implement the same thing with my website. I am happy using a header() 301 redirect in order to achieve this, but I am attempting to come up with a tight script that will do the trick.
My Code so Far
Please see it working by clicking here
// Set the title of the page article (This could be from the database). Trimming any spaces either side
$original_name = trim(' How to get file creation & modification date/times in Python with-dash?');
// Replace any characters that are not A-Za-z0-9 or a dash with a space
$replace_strange_characters = preg_replace('/[^\da-z-]/i', " ", $original_name);
// Replace any spaces (or multiple spaces) with a single dash to make it URL friendly
$replace_spaces = preg_replace("/([ ]{1,})/", "-", $replace_strange_characters);
// Remove any trailing slashes
$removed_dashes = preg_replace("/^([\-]{0,})|([\-]{2,})|([\-]{0,})$/", "", $replace_spaces);
// Show the finished name on the screen
print_r($removed_dashes);
The Problem
I have created this code and it works fine by the looks of things, it makes the string URL friendly and readable to the human eye. However, it I would like to see if it is possible to simplify or "tightened it up" a bit... as I feel my code is probably over complicated.
It is not so much that I want it put onto one line, because I could do that by nesting the functions into one another, but I feel that there might be an overall simpler way of achieving it - I am looking for ideas.
In summary, the code achieves the following:
Removes any "strange" characters and replaces them with a space
Replaces any spaces with a dash to make it URL friendly
Returns a string without any spaces, with words separated with dashes and has no trailing spaces or dashes
String is readable (Doesn't contain percentage signs and + symbols like simply using urlencode()
Thanks for your help!
Potential Solutions
I found out whilst writing this that article, that I am looking for what is known as a URL 'slug' and they are indeed useful for SEO.
I found this library on Google code which appears to work well in the first instance.
There is also a notable question on this on SO which can be found here, which has other examples.

I tried to play with preg like you did. However it gets more and more complicated when you start looking at foreign languages.
What I ended up doing was simply trimming the title, and using urlencode
$url_slug = urlencode($title);
Also I had to add those:
$title = str_replace('/','',$title); //Apache doesn't like this character even encoded
$title = str_replace('\\','',$title); //Apache doesn't like this character even encoded
There are also 3rd party libraries such as: http://cubiq.org/the-perfect-php-clean-url-generator

Indeed, you can do that:
$original_name = ' How to get file creation & modification date/times in Python with-dash?';
$result = preg_replace('~[^a-z0-9]++~i', '-', $original_name);
$result = trim($result, '-');
To deal with other alphabets you can use this pattern instead:
~\P{Xan}++~u
or
~[^\pL\pN]++~u

Replace anchor text with PHP (and regular expression)

I have a string that contains a lot of links and I would like to adjust them before they are printed to screen:
I have something like the following:
replace_this
and would like to end up with something like this
replace this
Normally I would just use something like:
echo str_replace("_"," ",$url);
In in this case I can't do that as the URL contains underscores so it breaks my links, the thought was that I could use regular expression to get around this.
Any ideas?

Here's the regex: <a(.+?)>.+?<\/a>.
What I'm doing is preserving the important dynamic stuff within the anchor tag, and and replacing it with the following function:
preg_replace('/<a(.+?)>.+?<\/a>/i',"<a$1>REPLACE</a>",$url);

This will cover most cases, but I suggest you review to make sure that nothing unexpected was missed or changed.
pattern = "/_(?=[^>]*<)/";
preg_replace($pattern,"",$url);

You can use this regular expression
(>(.*)<\s*/)
along with preg_replace_callback .
EDIT :
$replaced_text = preg_replace_callback('~(>(.*)<\s*/)~g','uscore_replace', $text);
function uscore_replace($matches){
return str_replace('_','',$matches[1]); //try this with 1 as index if it fails try 0, I am not entirely sure
}

Removing 'http://' from link via REGEX

What I would like to do is remove the "http://" part of these autogenerated links, below is an example of it.
http://google.com/search?gc...
Here are the regexes I am using in PHP to generate these links from a URL.
$patterns_sp[5] = '~([\S]+)~';
$replaces_sp[5] = '<a href=\1 target="_blank">\1<br/>';
$patterns_sp[6] = '~(?<=\>)([\S]{1,25})[^\s]+~';
$replaces_sp[6] = '\1...</a><br/>';
When these patterns are run on a URL like this:
http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex
the REGEX gives me:
http://google.com/search?gc...
Where I am stuck:
There is no obvious reason why I cannot modify the fourth line of code to read like this:
$patterns_sp[6] = '~(?<=\>http\:\/\/)([\S]{1,25})[^\s]+~';
However, the REGEX still seems to capture the "http://" part of the address, thus making a long list of these very redundant looking. What I am left with is the same thing as in the first example.

Replace...
$patterns_sp[5] = '~([\S]+)~';
...with...
$patterns_sp[5] = '~^(?:https?|ftp):([\S]+)~';
Then you can access the protocol-less version with $1 and the whole link with $0.
Optionally, you can remove a leading protocol with something like...
preg_replace('/^(?:https?|ftp):/', '', $str);

I suggest not writing your own regex, instead have a look at http://php.net/manual/en/function.parse-url.php
Retrieve the components of the URL, then compose a new version that only contains the parts you want.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

about use regex to convert url to link - php

Related

how to filter link url http:// and www with preg replace

converting url sperators with slash

Slugs for SEO using PHP - Appending name to end of URL

Replace anchor text with PHP (and regular expression)

Removing 'http://' from link via REGEX

Categories

Resources