preg_match_all and foreach only replacing last match - php

I have the following code, which should make plain text links clickable. However, if there are several links, it only replaces the last one.
Code:
$nc = preg_match_all('#<pre[\s\S]*</pre>#U', $postbits, $matches_code);
foreach($matches_code[0] AS $match_code)
{
$match = null;
$matches = null;
$url_regex = '#https?://(\w*:\w*#)?[-\w.]+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?[^<\.,:;"\'\s]+#';
$n = preg_match_all($url_regex, $match_code, $matches);
foreach($matches[0] AS $match)
{
$html_url = '' . $match . '';
$match_string = str_replace($match, $html_url, $match_code);
}
$postbits = str_replace($match_code, $match_string, $postbits);
}
Result:
http://www.google.com
http://www.yahoo.com
http://www.microsoft.com/ <-- only this one is clickable
Expected result:
http://www.google.com
http://www.microsoft.com/
Where is my error?

if there are several links it only replaces the last one
Where is my error?
Actually, it's replacing all 3 links, but it replaces the original string each time.
foreach($matches[0] AS $match)
{
$html_url = '' . $match . '';
$match_string = str_replace($match, $html_url, $match_code);
}
The loop is executed 3 times, each time it replaces 1 link in $match_code and assigns the result to $match_string. On the first iteration, $match_string is assigned the result with a clickable google.com. On the second iteration, $match_string is assigned with a clickable yahoo.com. However, you've just replaced the original string, so google.com is not clickable now. That's why you only get your last link as a result.
There are a couple of things you may also want to correct in your code:
The regex #<pre[\s\S]*</pre>#U is better constructed as #<pre.*</pre>#Us. The class [\s\S]* is normally used in JavaScript, where there is no s flag to allow dots matching newlines.
I don't get why you're using that pattern to match URLs. I think you could simply use https?://\S+. I'll also link you to some alternatives here.
You're using 2 preg_match_all() calls and 1 str_replace() call for the same text, where you could wrap it up in 1 preg_replace().
Code
$postbits = "
<pre>
http://www.google.com
http://w...content-available-to-author-only...o.com
http://www.microsoft.com/ <-- only this one clickable
</pre>";
$regex = '#\G((?:(?!\A)|.*<pre)(?:(?!</pre>).)*)(https?://\S+?)#isU';
$repl = '\1\2';
$postbits = preg_replace( $regex, $repl, $postbits);
ideone demo
Regex
\G Always from the first matching position in the subject.
Group 1
(?:(?!\A)|.*<pre) Matches the first <pre tag from the beggining of the string, or allows to get the next <pre tag if no more URLs found in this tag.
(?:(?!</pre>).)*) Consumes any chars inside a <pre> tag.
Group 2
(https?://\S+?) Matches 1 URL.

Related

PHP | Replicate specific word excluding the title attribute

I'm trying to replace the word "custom" and replicate it with <span> custom </span>.
With the str_replace () function it works but this also replaces it in the title attribute and I don't want this to happen because the span tag inside the title is an error.
How can I replace the word "custom" without touching the title attribute?
This is my code:
$oldText = "custom";
$newText = "<span>custom</span>";
$string = "<a href='#' title='Products custom'>Products custom</a>";
str_ireplace($oldText, $newText,$string);
This is just one example.
The word custom can also be placed in the middle of a string or at the beginning...
Thanks
You'll probably have to use PHP's DOM parser to do that. Writting a regular expression to solve it will just not work for all cases.
A) With DOM
I would start off with this Stackoverflow answer and then change it a bit to accomplish what you want to do. As you are replacing custom by <span>custom</span> you'll be creating a new DOM element. Replacing the text content won't work because <span> will be escaped and replaced by <span>.
So I would do this:
use preg_match_all() with a pattern such as /\bcustom\b/ to get all the offsets of the found items in the text:
// Search for the word, but delimited by word boundaries to
// avoid matching 'custom' in 'customization' or 'customer'.
$pattern = '/\b' . preg_quote($word_to_search) . '\b/';
if (preg_match_all($pattern, $child->wholeText, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
var_export($matches);
}
convert these offsets in bytes to offsets in chars (this is because UTF-8 can have chars of 1 or n bytes):
function char_offset($string, $byte_offset, $encoding = null)
{
$substr = substr($string, 0, $byte_offset);
return mb_strlen($substr, $encoding ?: mb_internal_encoding());
}
use DOMText::splitText() to split the text nodes into two text nodes with the offset in char unit.
create a <span> element with DOMDocument::createElement()
$new_text = 'custom'; // or whatever.
$spanElement = $domNode->ownerDocument->createElement('span', $new_text);
insert this span element before the second text node with DOMNode::insertBefore()
correct the second text node to remove the custom word at the beginning.
B) With a regex
But if your case is always in a <a> tag then you could have a go with something like this: https://regex101.com/r/ksPqxe/1
For the regex explanation, look at the description on the right column. You could remove the i flag for case-insensitive, if needed. The s flag is used so that the . also matches new lines. I had to use the ungreedy search with .*? instead of .*. So in the end I used the U for Ungreedy flag and then used .*.
This solution will not handle the case of several custom words in the link. But you'll probably only have it once. If you need that then use one regex to get the text content of the link and then a second one to replace all instances of custom by <span>custom</span>.
<?php
$pattern = '/(<a[^>]*>.*)\bcustom\b(.*<\/a>)/isU';
// Or without the ungreedy flag:
//$pattern = '/(<a[^>]*>.*?)\bcustom\b(.*?<\/a>)/is';
$substitution = '$1<span>custom</span>$2';
$inputs = [
"<a href='#' title='Products custom'>Products custom</a>",
'Custom stuff',
'<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>',
'customize it!',
];
$results = [];
foreach ($inputs as $input) {
$result = preg_replace($pattern, $substitution, $input);
$results[] = "$input\n$result\n";
}
print implode(str_repeat('-', 80) . "\n", $results);
Output:
<a href='#' title='Products custom'>Products custom</a>
<a href='#' title='Products custom'>Products <span>custom</span></a>
--------------------------------------------------------------------------------
Custom stuff
<span>custom</span> stuff
--------------------------------------------------------------------------------
<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>
<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>
--------------------------------------------------------------------------------
customize it!
customize it!

PHP replace common words from my file

I've tried to make a tool in which you input a website and when you click the submit button it cURLS all the text.
After all the cURLing, stripping it from tags, and counting the words. It's eventually an array named $frequency. If I echo it using <pre> tags it will show me everything just fine! (NOTE: I'm placing the contents in a file, $homepage = file_get_contents($file); and this is what I work with in my code, I don't know if this matters or not)
However i don't really care if the word or is seen 200 times in a website, I only want the important words. So i have made an array with all the common words. Which is set eventually in the $common_words variable. But i can't seem to find a way to replace all words found in the $frequency to replace them with "" if they are found in the $common_words as well.
I've found this piece of code after some research:
$string = 'sand band or nor and where whereabouts foo';
$wordlist = array("or", "and", "where");
foreach ($wordlist as &$word) {
$word = '/\b' . preg_quote($word, '/') . '\b/';
}
$string = preg_replace($wordlist, '', $string);
var_dump($string);
If I copy paste this it works fine, removing the or, and, where from the string.
But replacing $string with $frequency or replacing $wordlist with $common_words will either not work or throw me an error like: Delimiter must not be alphanumeric or backslash
I hope i've formulated my question properly, if not. Please tell me!
Thanks in advance
EDIT: Alright, i've narrowed down the problem alot. First of all i forgot the & inside the foreach ($wordlist as &$word) {
But as it was counting all the words, the words it has replaced are all still counted. See those 2 screenshots to see what I mean: http://imgur.com/oqqZR3h,xHEZKRz#0
If I understand this correctly you wan't to know how many occurrences each word has by ignoring the so called common words.
Assuming that $url is the page you will be running against and $common_words is your common words array, here is what you can do:
// Get the page content's and strip the html tags
$contents = strip_tags( file_get_contents($url) );
// This will split the words from the contents, creating an array with each word in it
preg_match_all("/([\w]+[']?[\w]*)\W/", $contents, $words);
$common_words = array('or', 'and', 'I', 'where');
$frequency = array();
// Count occurrences
$frequency = array_count_values($words[0]);
unset($words); // Release all that memory
var_dump($frequency);
At this point you will have an associative array with each not common word and a count showing the number of occurrences of the given word.
UPDATE
A bit more about the RegEx. We need to match word. The easiest way possible is: (\w+). But that won't match words like I've or haven't (Notice the '). That was my point of making it more complicated. Also, \w doesn't support dashes for words like in 6-year-old.
So I created a subgroup which should match words characters including dashed and single quotes in a word.
(?:\w'|\w|-)
The ?: part on the beginning is do not match or do not include in the results. That is since all I am doing is grouping the options for word contents. To mach an entire word the RegEx will match one or more of the subgroup above:
((?:\w'\w|\w|-)+)
So the RegEx preg_match_all() line should be:
preg_match_all("/((?:\w'\w|\w|-)+)/", $contents, $words);
Hope this helps.
I had changed $wordlist with $mywordlist. still its working!
<?php
$string = 'sand band or nor and where whereabouts foo';
$wordlist = array("or", "and", "where");
$mywordlist=array("sand","band");
foreach ($mywordlist as &$word) {
$word = '/\b' . preg_quote($word, '/') . '\b/';
}
$string = preg_replace($mywordlist, '', $string);
var_dump($string);
?>
I suppose you can do simply like this:
$common_words = "foo baq etc etc";
$str = "foo bar baz"; // input
foreach (explode(" ", $common_words) as $word){
$str = strtr($str, $word, "");
}

Loop over array and replacing with regex returns empty string

I have a String which contains substrings which I have to replace. The substrings are stored in an array. When I loop through the array everything works fine, until the array has more than 120 entries.
foreach ( $activeTags as $k => $v ) {
$find = $activeTags[$k]['Tag']['tag'];
$replace = 'that';
$pattern = "/\#\#[a-zA-Z][a-zA-Z]\#\#.*\b$find\b.*\#\#END_[a-zA-Z][a-zA-Z]\#\#|$find/";
$sText = '<p>Do not replace ##DS## this ##END_DS## replace this.</p>';
$sText = preg_replace_callback($pattern, function($match) use($find, $replace){
if($match[0] == $find){
return($replace);
}else{
return($match[0]);
}
}, $sText);
}
when count($activeTags) == 121 i only get an empty string.
Has onyone an idea why this happens?
Try this improved pattern:
$pattern = "~##([a-zA-Z]{2})##.*?\b$find\b.*?##END_\1##|$find~s";
Description
Discussion
The ~s flag indicates that dot (.) should match newlines. In your example, p tags are metionned. So I guess its an html fragment. Since newlines are alloed in html, I have added the ~s flag. More over, I have made the pattern more readable by:
using custom pattern boundaries: / becomes ~, you avoid escape anything...
replacing duplicate subpatterns: [a-zA-Z][a-zA-Z] becomes [a-zA-Z]{2}
taking advantage of the sequence ##DS## ##END_DS##. I use a backreference (\1) for matching what was found in the first matching group (Group 1 in the above image).

preg matching and replacing elements

Hi how do I do a preg match on
$string1 = "[%refund%]processed_by"
$string2 = "[%refund%]date_sent"
I want to grab the bits inside %% and then remove the [%item%] altogether. leaving just the "proccessed_by" or "date_sent" I have had a go below but come a bit stuck.
$unprocessedString = "[%refund%]date_sent"
$match = preg_match('/^\[.+\]/', $unprocessedString);
$string = preg_replace('/^\[.+\]/', $unprocessedString);
echo $match; // this should output refund
echo $string; // this should output date_sent
Your problem is with your use of the preg_match function. It returns the number of matches found. But if you pass it a variable as a third parameter, it stores the matches for the entire pattern and its subpatterns in an array.
So you can capture both of the parts you want in subpatterns with preg_match, which means you don't need preg_replace:
$unprocessedString = "[%refund%]date_sent"
preg_match('/^\[%(.+)%\](.+)/', $unprocessedString, $matches);
echo $matches[1]; // outputs 'refund'
echo $matches[2]; // outputs 'date_sent'

How to replace new lines by regular expressions

How can I set any quantity of new lines with a regular expression?
$var = "<p>some text</p><p>another text</p><p>more text</p>";
$search = array("</p>\s<p>");
$replace = array("</p><p>");
$var = str_replace($search, $replace, $var);
I need to remove every new line (\n), not <br/>, between two paragraphs.
To begin with, str_replace() (which you referenced in your original question) is used to find a literal string and replace it. preg_replace() is used to find something that matches a regular expression and replace it.
In the following code sample I use \s+ to find one or more occurrences of white space (new line, tab, space...). \s is whitespace, and the + modifier means one or more of the previous thing.
<?php
// Test string with white space and line breaks between paragraphs
$var = "<p>some text</p> <p>another text</p>
<p>more text</p>";
// Regex - Use ! as end holders, so that you don't have to escape the
// forward slash in '</p>'. This regex looks for an end P then one or more (+)
// whitespaces, then a begin P. i refers to case insensitive search.
$search = '!</p>\s+<p>!i';
// We replace the matched regex with an end P followed by a begin P w no
// whitespace in between.
$replace = '</p><p>';
// echo to test or use '=' to store the results in a variable.
// preg_replace returns a string in this case.
echo preg_replace($search, $replace, $var);
?>
Live Example
I find it odd to have huge HTML strings, and then using some string search and replace hack to format that afterwards...
When constructing HTML with PHP, I like using arrays:
$htmlArr = array();
foreach ($dataSet as $index => $data) {
$htmlArr[] = '<p>Line#'.$index.' : <span>' . $data . '</span></p>';
}
$html = implode("\n", $htmlArr);
This way, every HTML line has its separate $htmlArr[] value. Moreover, if you need your HTML to be "pretty print", you can simply have some sort of method that will indent your HTML by prepending whitespaces at the beginning of every array elements depending on some rule set. For example, if we have:
$htmlArr = array(
'<ol>',
'<li>Item 1</li>',
'<li>Item 2</li>',
'<li>Item 3</li>',
'</ol>'
);
Then the formatting function algorithm would be (a very simple one, considering that the HTML is well constructed):
$indent = 0; // Initial indent
foreach & $value in $array
$open = Count how many opened elements
$closed = Count how many closed elements
$value = str_repeat(' ', $indent * TAB_SPACE) . $value;
$indent += $open - $closed; // Next line's indent
end foreach
return $array
Then implode("\n", $array) for the prettyfied HTML
After the question edit by Felix Kling, I realize that this has nothing to do with the question. Sorry about that :) Thanks though for the clarification.

Categories