PHP regex: Converting and manipulating text into specific HTML - php

I have the following text:
I'm a link - http://google.com
I need to convert this into the following HTML
I'm a link
How can I achieve this in PHP? I'm assuming this needs some sort of regex to search for the actual text and link then manipulate the text into the HTML but I wouldn't know where to start, any help would be greatly appreciated.

If its always like this, you don't really need regex here:
$input = "I'm a link - http://google.com";
list($text, $link) = explode(" - ", $input);
echo "<a href='". $link ."'>". $text ."</a>";

If a regex is needed, here's a fully function code:
<?php
$content = <<<EOT
test
http://google.com
test
EOT;
$content = preg_replace(
'/(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?/',
'<a href=\'$0\'>I\'m a link</a>',
$content
);
echo $content;
?>
Example here: http://phpfiddle.io/fiddle/1166866001
If it's always one line of text though, it would be better to go with 1nflktd solution.

Try with capturing groups and substitution:
^([^-]*) - (.*)$
DEMO
Sample code:
$re = "/^([^-]*) - (.*)$/i";
$str = "I'm a link - http://google.com";
$subst = '$1';
$result = preg_replace($re, $subst, $str);
Output:
I'm a link
Pattern Explanation:
^ the beginning of the string
( group and capture to \1:
[^-]* any character except: '-' (0 or more times)
) end of \1
- ' - '
( group and capture to \2:
.* any character except \n (0 or more times)
) end of \2
$ before an optional \n, and the end of the string

Related

Operation on string in PHP. Remove part of string

How can i remove part of string from example:
##lang_eng_begin##test##lang_eng_end##
##lang_fr_begin##school##lang_fr_end##
##lang_esp_begin##test33##lang_esp_end##
I always want to pull middle of string: test, school, test33. from this string.
I Read about ltrim, substr and other but I had no good ideas how to do this. Becouse each of strings can have other length for example :
'eng', 'fr'
I just want have string from middle between ## and ##. to Maye someone can help me? I tried:
foreach ($article as $art) {
$title = $art->titl = str_replace("##lang_eng_begin##", "", $art->title);
$art->cleanTitle = str_replace("##lang_eng_end##", "", $title);
}
But there
##lang_eng_end##
can be changed to
##lang_ger_end##
in next row so i ahvent idea how to fix that
If your strings are always in this format, an explode way looks easy:
$str = "##lang_eng_begin##test##lang_eng_end## ";
$res = explode("##", $str)[2];
echo $res;
You may use a regex and extract the value in between the non-starting ## and next ##:
$re = "/(?!^)##(.*?)##/";
$str = "##lang_eng_begin##test##lang_eng_end## ";
preg_match($re, $str, $match);
print_r($match[1]);
See the PHP demo. Here, the regex matches a ## that is not at the string start ((?!^)##), then captures into Group 1 any 0+ chars other than newline as few as possible ((.*?)) up to the first ## substring.
Or, replace all ##...## substrings with `preg_replace:
$re = "/##.*?##/";
$str = "##lang_eng_begin##test##lang_eng_end## ";
echo preg_replace($re, "", $str);
See another demo. Here, we just remove all non-overlapping substrings beginning with ##, then having any 0+ chars other than a newline up to the first ##.

PHP only leave text in between particular characters

I have many strings that look like this:
[additional-text Sample text...]
There is always an opening bracket + additional-text + single space and a closing bracket in the end and I need to remove all of them, so that the final string would look like this:
Sample text...
Any help or guidance is much appreciated.
You can use this o get all matches within a text block:
preg_match_all("/\[additional-text (.*?)\]/",$text,$matches);
all your texts will be in $matches[1]. So that will be:
$text = "[additional-text Sample text...]dsfg fgfd[additional-text Sample text2...] foo bar adfd as ff";
preg_match_all("/\[additional-text (.*?)\]/",$str,$matches);
var_export($matches[1]);
Get the substring you want to keep as a captured group:
^\[\S+\s([^]]+)\]$
Now in the replacement, use the only captured group, \1.
Demo
You can use:
$re = '/\[\S+\s|\]/';
$str = "[additional-text Sample text...]";
$result = preg_replace($re, '', $str);
//=> Sample text...
RegEx Demo
Use substr to remove the first 17 characters. Use regex to remove the last two:
$val = '[additional-text Sample text...]';
$text = preg_replace('#\]$#', '', substr($val, 17));
you can also do this
$a = '[additional-text Sample text...]';
$a= ltrim($a,"[additional-text ");
echo $a= rtrim($a,"]");
There is no need in regex, use substr:
$s = "[additional-text Sample text...]";
echo substr($s, 17, strlen($s)-18);
Where 17 is the length of [additional-text and 18 is the same + 1 for the last ].
See PHP demo
A regex solution is also basic:
^\[additional-text (.*)]$
or - if there can be no ] before the end:
^\[additional-text ([^]]*)]$
And replace with $1 backreference. See the regex demo, and here is a PHP demo:
$result = preg_replace('~^\[additional-text (.*)]$~', "$1", "[additional-text Sample text...]");
echo $result;
Pattern details:
^ - start of string
\[ - a literal [
additional-text - literal text
(.*) - zero or more characters other than a newline as many as possible up to
]$ - a ] at the end of the string.

How to not perform preg_replace if subject starts with quote

I'm trying to convert plain links to HTML links using preg_replace. However it's replacing links that are already converted.
To combat this I'd like it to ignore the replacement if the link starts with a quote.
I think a positive lookahead may be needed but everything I've tried hasn't worked.
$string = 'test http://www.example.com';
$string = preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $string);
var_dump($string);
The above outputs:
http://www.example.com">test</a> http://www.example.com
When it should output:
test http://www.example.com
You might get along with lookarounds.
Lookarounds are zero-width assertions that make sure to match/not to match anything immediately around the string in question. They do not consume any characters.
That being said, a negative lookbehind might be what you need in your situation:
(?<![">])\bhttps?://\S+\b
In PHP this would be:
<?php
$string = 'I want to be transformed to a proper link: http://www.google.com ';
$string .= 'But please leave me alone ';
$string .= '(https://www.google.com).';
$regex = '~ # delimiter
(?<![">]) # a neg. lookbehind
https?://\S+ # http:// or https:// followed by not a whitespace
\b # a word boundary
~x'; # verbose to enable this explanation.
$string = preg_replace($regex, "<a href='$0'>$0</a>", $string);
echo $string;
?>
See a demo on ideone.com. However, maybe a parser is more appropriate.
Since you can use Arrays in preg_replace, this might be convenient to use depending on what you want to achieve:
<?php
$string = 'test http://www.example.com';
$rx = array("&(<a.+https?:\/\/[\w]+[^ \,\"\n\r\t<]*>)(.*)(<\/a\>)&si", "&(\s){1,}(https?:\/\/[\w]+[^ \,\"\n\r\t<]*)&");
$rp = array("$1$2$3", "$2");
$string = preg_replace($rx,$rp, $string);
var_dump($string);
// DUMPS:
// 'testhttp://www.example.com'
The Idea
You can split your string at the already existing anchors, and only parse the pieces in between.
The Code
$input = 'test http://www.example.com';
// Split the string at existing anchors
// PREG_SPLIT_DELIM_CAPTURE flag includes the delimiters in the results set
$parts = preg_split('/(<a.*?>.*?<\/a>)/is', $input, PREG_SPLIT_DELIM_CAPTURE);
// Use array_map to parse each piece, and then join all pieces together
$output = join(array_map(function ($key, $part) {
// Because we return the delimiter in the results set,
// every $part with an uneven key is an anchor.
return $key % 2
? preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $part)
: $part;
}, array_keys($parts), $parts);

Regex in PHP: Replacing text between strings

Okay I have made some progress on a problem I am solving, but need some help with a small glitch.
I need to remove all characters from the filenames in the specific path images/prices/ BEFORE the first digit, except for where there is from_, in which case remove all characters from the filename BEFORE from_.
Examples:
BEFORE AFTER
images/prices/abcde40.gif > images/prices/40.gif
images/prices/UgfVe5559.gif > images/prices/5559.gif
images/prices/wedsxcdfrom_88457.gif > images/prices/from_88457.gif
What I've done:
$pattern = '%images/(.+?)/([^0-9]+?)(from_|)([0-9]+?)\.gif%';
$replace = 'images/\\1/\\3\\4.gif';
$string = "AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD";
$newstring = str_ireplace('from_','733694521548',$string);
while(preg_match($pattern,$newstring)){
$newstring=preg_replace($pattern,$replace,$newstring);
}
$newstring=str_ireplace('733694521548','from_',$newstring);
echo "Original:\n$string\n\nNew:\n$newstring";
My expected output is:
AAA images/prices/40.gif BBB images/prices/from_88457.gif CCC images/prices/5559.gif DDD"
But instead I am getting:
AAA images/prices/40.gif BBB images/from_88457.gif CCC images/5559.gif DDD
The prices/ part of the path is missing from the last two paths.
Note that the AAA, BBB etc. portions are just placeholders. In reality the paths are scattered all across a raw HTML file parsed into a string, so we cannot rely on any pattern in between occurrences of the text to be replaced.
Also, I know the method I am using of substituting from_ is hacky, but this is purely for a local file operation and not for a production server, so I am okay with it. However if there is a better way, I am all ears!
Thanks for any assistance.
You can use lookaround assertions:
preg_replace('~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i', '', $value);
Explanation:
(?<=/) # If preceded by a '/':
(?: # Begin group
([a-z]+) # Match alphabets from a-z, one or more times
(?=\d+\.gif) # If followed followed by digit(s) and '.gif'
| # OR
(\w+) # Match word characters, one or more times
(?=from_) # If followed by 'from_'
) # End group
Visualization:
Code:
$pattern = '~(?<=/)(?:([a-z]+)(?=\d+\.gif)|(\w+)(?=from_))~i';
echo preg_replace($pattern, '', $string);
Demo
You can use this regex for replacement:
^(images/prices/)\D*?(from_)?(\d+\..+)$
And use this expression for replacement:
$1$2$3
RegEx Demo
Code:
$re = '~^(images/prices/)\D*?(from_)?(\d+\..+)$~m';
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$result = preg_replace($re, '$1$2$3', $str);
You can try with Lookaround as well. Just replace with blank string.
(?<=^images\/prices\/).*?(?=(from_)?\d+\.gif$)
regex101 demo
Sample code: (directly from above site)
$re = "/(?<=^images\\/prices\\/).*?(?=(from_)?\\d+\\.gif$)/m";
$str = "images/prices/abcde40.gif\nimages/prices/UgfVe5559.gif\nimages/prices/wedsxcdfrom_88457.gif";
$subst = '';
$result = preg_replace($re, $subst, $str);
If string is not multi-line then use \b as word boundary instead of ^ and $ to match start and end of the line/string.
(?<=\bimages\/prices\/).*?(?=(from_)?\d+\.gif\b)
$arr = array(
'images/prices/abcde40.gif',
'images/prices/UgfVe5559.gif',
'images/prices/wedsxcdfrom_88457.gif'
);
foreach($arr as $str){
echo preg_replace('#images/prices/.*?((from_|\d).*)#i','images/prices/$1',$str);
}
DEMO
EDIT:
$str = 'AAA images/prices/abcde40.gif BBB images/prices/wedsxcdfrom_88457.gif CCC images/prices/UgfVe5559.gif DDD';
echo preg_replace('#images/prices/.*?((from_|\d).*?\s|$)#i','images/prices/$1',$str), PHP_EOL;

Change URL and append file extension with REGEX

I've been reading up on RegEx docs but I must say I'm still a bit out of my element so I apologize for not posting what I have tried because it was all just plain wrong.
Heres the issue:
I've got images using the following source:
src="http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi"
I need to get to this:
src="http://newsite.com/wp-content/uploads/2014/07/6a015433877b2b970c01a3fd22309b970b-800wi.jpg"
Essentially removing the /.a/ from the URL and appending a .jpg to the end of the image file name. If it helps in a solution I'm using this plug-in: http://urbangiraffe.com/plugins/search-regex/
Thanks All.
This might help you.
(?<=src="http:\/\/)samplesite\/\.a\/([^"]*)
Online demo
Sample code:
$re = "/(?<=src=\"http:\/\/)samplesite\/\.a\/([^\"]*)/";
$str = "src=\"http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi\"";
$subst = 'newsite.com/wp-content/uploads/2014/07/$1.jpg';
$result = preg_replace($re, $subst, $str);
Output:
src="http://newsite.com/wp-content/uploads/2014/07/6a015433877b2b970c01a3fd22309b970b-800wi.jpg"
Pattern Description:
(?<= look behind to see if there is:
src="http: 'src="http:'
\/ '/'
\/ '/'
) end of look-behind
samplesite 'samplesite'
\/ '/'
\. '.'
a 'a'
\/ '/'
( group and capture to \1:
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
) end of \1
You can try it without using Positive Lookbehind as well
(src="http:\/\/)samplesite\/\.a\/([^"]*)
Online demo
Sample code:
$re = "/(src=\"http:\/\/)samplesite\/\.a\/([^\"]*)/";
$str = "src=\"http://samplesite/.a/6a015433877b2b970c01a3fd22309b970b-800wi\"";
$subst = '$1newsite.com/wp-content/uploads/2014/07/$2.jpg';
$result = preg_replace($re, $subst, $str);
You can use this:
$replaced = preg_replace('~src="http://samplesite/\.a/([^"]+)"~',
'src="http://newsite.com/wp-content/uploads/2014/07/\1.jpg"',
$yourstring);
Explanation
([^"]+) matches any characters that are not a " to Group 1
\1 inserts Group 1 in the replacement.

Categories