I have a robot voice that turns sentences into spoken mp3 files.
But the robot voice can't pronounce urls so I want to filter the urls out.
I got dynamic strings coming in that can look like this:
"Hello my name is Jeffrey"
This works fine but strings can also contains urls and looks like this:
"Hello http://wwww.google.nl is a very nice site."
or
"Hello how are you doing https://soundcloud.com/theforeignexchangemusic/zo-manmade-sampler …"
or
"Take a look at this picture http://instagram.com/p/xPiSn8Pmli/ "
And so on
If a string contains an url I want to replace the url with a word.
Does anybody know a good way of doing this?
Because the strings are dynamic (Length, content and location) I find it very hard to do.
If someone has a good idea please let me know!
Would be appreciated a lot.
Your best bet is to use RegEx to parse the strings to see if URLs come up.
Using RegEx to then find the base domain and then vocalize that.
/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/ Regex for URLs
For more reference
For Parsing your URL
If somebody is looking for exactly the same thing here is a working example.
<?php
$hallo = 'Hey it works. http://onderbroekenlol.nl something behind the url.';
$str = preg_replace("/(?:https?:\/\/)?(?:[\w]+\.)([a-zA-Z\.]{2,6})([\/\w\.-]*)*\/?/", "a website", $hallo);
print $str;
?>
Related
I would like to test for a language match in a url.
Url will be like : http://www.domainname.com/en/#m=4&guid=%some_param%
I want to check if there is an existing language code within the url. I was thinking something between these lines :
^(.*:)\/\/([a-z\-.]+)(:[0-9]+)?(.*)$
or
^(http|https:)\/\/([a-z\-.]+)(:[0-9]+)?(.*)$
I'm not that sharp with regex. can anyone help or point me towards the right direction ?
[https]+://[a-z-]+.([a-z])+/
try this,
http://www.regexr.com/ this is a easy site for creating regex
If you know the data you are testing is a url then I would not bother adding all of the url parts to the regex. Keep it simple like: /\/[a-z]{2}\// That looks for a two letter combination between two forward slashes. If you need to capture the language code then wrap it in parentheses: /\/([a-z]{2})\//
I have a blog page on my website where a user edit's a post by going to a URL like this... http://www.example.com/blog?edit=blog post here. The script used to replace the spaces with %20 like it should but now it is replacing the spaces with %2520 and now the script can't search the database because there is no post called blog20post20here. I was going to go down the path of preg_replace, so I tried this...
preg_replace("/%2520/"," ",$_GET['edit']);
but that didn't seem to work.
I have never used preg_replace() and I just now read up on it in the manual. If someone could either point me down the right path and or show me how to correctly use preg_replace that would be awesome.
Sounds like you're double-escaping somewhere when generating the urls. %25 is the coding for the % character, so it sounds like it's going from %20 to %2520.
As an aside, there's better ways to decode that url (urldecode() for example), so perhaps preg_replace isn't really necessary...
EDIT: oh, and you should just use urlencode to generate the url in the first place.
For %2520
<?php echo urldecode(urldecode($_GET['edit'])); ?>
For %20
<?php echo urldecode($_GET['edit']); ?>
following code is used to find url from a string with php. Here is the code:
$string = "Hello http://www.bytes.com world www.yahoo.com";
preg_match('/(http:\/\/[^\s]+)/', $string, $text);
$hypertext = "" . $text[0] . "";
$newString = preg_replace('/(http:\/\/[^\s]+)/', $hypertext, $string);
echo $newString;
Well, it shows a link but if i provide few link it doesn't work and also if i write without http:// then it doesn't show link. I want whatever link is provided it should be active, Like stackoverflow.com.
Any help please..
A working method for linking with http/https/ftp/ftps/scp/scps:
$newStr = preg_replace('!(http|ftp|scp)(s)?:\/\/[a-zA-Z0-9.?&_/]+!', "\\0",$str);
I strongly advise NOT linking when it only has a dot, because it will consider PHP 5.2, ASP.NET, etc. links, which is hardly acceptable.
Update: if you want www. strings as well, take a look at this.
If you want to detect something like stackoverflow.com, then you're going to have to check for all possible TLDs to rule out something like Web 2.0, which is quite a long list. Still, this is also going to match something as ASP.NET etc.
The regex would looks something like this:
$hypertext = preg_replace(
'{\b(?:http://)?(www\.)?([^\s]+)(\.com|\.org|\.net)\b}mi',
'$1$2$3',
$text
);
This only matches domains ending in .com, .org and .net... as previously stated, you would have to extend this list to match all TLDs
#axiomer your example wasn't work if link will be in format:
https://stackoverflow.com?val1=bla&val2blablabla%20bla%20bla.bl
correct solution:
preg_replace('!(http|ftp|scp)(s)?:\/\/[a-zA-Z0-9.?%=&_/]+!', "\\0", $content);
produces:
https://stackoverflow.com?val1=bla&val2blablabla%20bla%20bla.bl
I'd like to do some operations on incoming e-mails. Namely transform all 6 digit numbers into links which lead to a url based on the number.
I don't want to open a huge can of worms, in terms of APIs or languages besides PHP, this isn't that much of a timesaver, but it would be nice. Anyone done anything like this? Just looking to get pointed in the right direction !
You can use a regex to find your numbers and replace them with your links. Since I do not know your link structure, I made one up.
Here is a simple example:
$str = "Testing 385758 String";
preg_replace( '/(\d{6})/', '$1', $str);
This will turn $str into:
Testing 385758 String
Demo
I'm trying to find a reliable solution to extract a url from a string of characters. I have a site where users answer questions and in the source box, where they enter their source of information, I allow them to enter a url. I want to extract that url and make it a hyperlink. Similar to how Yahoo Answers does it.
Does anyone know a reliable solution that can do this?
All the solutions I have found work for some URL's but not for others.
Thanks
John Gruber has spent a fair amount of time perfecting the "one regex to rule them all" for link detection. Using preg_replace() as mentioned in the other answers, using the following regex should be one of the most accurate, if not the most accurate, method for detecting a link:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
If you only wanted to match HTTP/HTTPS:
(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
$string = preg_replace('/https?:\/\/[^\s"<>]+/', '$0', $string);
It only matches http/https, but that's really the only protocol you want to turn into a link. If you want others, you can change it like this:
$string = preg_replace('/(https?|ssh|ftp):\/\/[^\s"]+/', '$0', $string);
There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.
I created a PHP library that could deal with lots of edge cases: Url highlight.
You could extract urls from string or directly highlight them.
Example:
<?php
use VStelmakh\UrlHighlight\UrlHighlight;
$urlHighlight = new UrlHighlight();
// Extract urls
$urlHighlight->getUrls("This is example http://example.com.");
// return: ['http://example.com']
// Make urls as hyperlinks
$urlHighlight->highlightUrls('Hello, http://example.com.');
// return: 'Hello, http://example.com.'
For more details see readme. For covered url cases see test.
Yahoo! Answers does a fairly good job of link identification when the link is written properly and separate from other text, but it isn't very good at separating trailing punctuation. For example The links are http://example.com/somepage.php, http://example.com/somepage2.php, and http://example.com/somepage3.php. will include commas on the first two and a period on the third.
But if that is acceptable, then patterns like this should do it:
\<http:[^ ]+\>
It looks like stackoverflow's parser is better. Is is open source?
This code is worked for me.
function makeLink($string){
/*** make sure there is an http:// on all URLs ***/
$string = preg_replace("/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i", "$1http://$2",$string);
/*** make all URLs links ***/
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</a>",$string);
/*** make all emails hot links ***/
$string = preg_replace("/([\w-?&;#~=\.\/]+\#(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","$1",$string);
return $string;
}