preg_replace_callback() memory issue

preg_replace_callback() memory issue - php

i'm having a memory issue while testing a find/replace function.
Say the search subject is:
$subject = "I wrote an article in the A+ magazine.
It'\s very long and full of words.
I want to replace every A+ instance in this text by a link
to a page dedicated to A+.";
the string to be found :
$find='A+';
$find = preg_quote($find,'/');
the replace function callback:
function replaceCallback($match)
{
if (is_array($match)) {
return '<a class="tag" rel="tag-definition" title="Click to know more about ' .stripslashes($match[0]) . '" href="?tag=' . $match[0]. '">' . stripslashes($match[0]) . '</a>';
}
}
and the call:
$result = preg_replace_callback($find, 'replaceCallback', $subject);
now, the complete searched pattern is drawn from the database. As of now, it is:
$find = '/(?![^<]+>)\b(voice recognition|test project reference|test|synesthesia|Superflux 2007|Suhjung Hur|scripts|Salvino a. Salvaggio|Professional Lighting Design Magazine|PLDChina|Nicolas Schöffer|Naziha Mestaoui|Nabi Art Center|Markos Novak|Mapping|Manuel Abendroth|liquid architecture|LAb[au] laboratory for Architecture and Urbanism|l'Arca Edizioni|l' ARCA n° 176 _ December 2002|Jérôme Decock|imagineering|hypertext|hypermedia|Game of Life|galerie Roger Tator|eversion|El Lissitzky|Bernhard Tschumi|Alexandre Plennevaux|A+)\b/s';
This $find pattern is then looked for (and replaced if found) in 23 columns across 7 mysql tables.
Using the suggested preg_replace() instead of preg_replace_callback() seems to have solved the memory issue, but i'm having new issues down the path: the subject returned by preg_replace() is missing a lot of content...
UPDATE:
the content loss is due to using preg_quote($find,'/');
It now works, except for... 'A+' which becomes 'A ' after the process.

I'm trying to reproduce your error but there's a parse error that needs to be fixed first. Either this isn't enough code to be a good sample or there's genuinely a bug.
First of all, the value you store in $find is not a pull pattern - so I had to add pattern delimiters.
Secondly, your replace string doesn't include the closing element for the anchor tags.
$subject = "
I wrote an article in the A+ magazine. It'\s very long and full of words. I want to replace every A+ instance in this text by a link to a page dedicated to A+.
";
$find='A+';
$find = preg_quote($find,'/');
function replaceCallback($match)
{
if (is_array($match)) {
return '<a class="tag" rel="tag-definition" title="Click to know more about ' .stripslashes($match[0]) . '" href="?tag=' . $match[0]. '">' . stripslashes($match[0]) . '</a>';
}
}
$result = preg_replace_callback( "/$find/", 'replaceCallback', $subject);
echo $result;
This code works, but I'm not sure it's what you want. Also, I have have strong suspicion that you don't need preg_replace_callback() at all.

This here works for me, i had to change the preg match a bit but it turns every A+ for me into a link. You also are missing a </a> at the end.
$subject = "I wrote an article in the A+ magazine. It'\s very long and full of words. I want to replace every A+ instance in this text by a link to a page dedicated to A+.";
function replaceCallback($match)
{
if (is_array($match))
{
return '<a class="tag" rel="tag-definition" title="Click to know more about ' .stripslashes($match[0]) . '" href="?tag=' . $match[0]. '">' . stripslashes($match[0]) . '</a>';
}
}
$result = preg_replace_callback("/A\+/", "replaceCallback", $subject);
echo $result;

Alright - I can see, now, why you're using the callback
First of all, I'd change your callback to this
function replaceCallback( $match )
{
if ( is_array( $match ) )
{
$htmlVersion = htmlspecialchars( $match[1], ENT_COMPAT, 'UTF-8' );
$urlVersion = urlencode( $match[1] );
return '<a class="tag" rel="tag-definition" title="Click to know more about ' . $htmlVersion . '" href="?tag=' . $urlVersion. '">' . $htmlVersion . '</a>';
}
return $match;
}
The stripslashes commands aren't going to do you any good.
As far as addressing the memory issue, you may want to break down your pattern into multiple patterns and execute them in a loop. I think your match is just too big/complex for PHP to handle it in a single call cycle.

Related

Preg Relplace only replace instances of text with a trailing space

I have a preg_replace question. I am using preg_replace to generate contextual links in text blocks using the following code:
$contextualLinkStr = 'mytext';
$content = 'My text string which includes MYTEXT in various cases such as Mytext and mytext. However it also includes image tags such as <img src="http://www.myurl.com/mytext-1.jpg">';
$content = preg_replace('/' . $contextualLinkStr . '/i', '\\0', $content);
The preg_replace is working well on the text and generating the relevant links while retaining case but it's also generating a link within the URL of the image tag. I was thinking If I simply added a trailing space to the expression in the preg_replace function it would fix it due to the fact that all text instances will have a trailing space whereas no image urls will, as follows:
$content = preg_replace('/' . $contextualLinkStr . '/i' . ' ', '\\0' . ' ', $content);
But this doesn't work. Can anybody tell me how I make the trailing space a condition of the match?
Thanks in advance.
Jason.

I've just worked it out guys. I was being daft. For reference the relevant code is:
$content = preg_replace('/' . $contextualLinkStr . ' /i', '\\0 ', $content);
Thanks.

PHP - find all hyperlinks in a post, add target and rel=nofollow attribute

I need to find a way to read content posted by user to find any hyperlinks that might have been included, create anchor tags, add target and rel=nofollow attribute to all those links.
I have come across some REGEX solutions like this:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
But on other questions on SO about the same problem, it has been highly recommended NOT to use REGEX instead use DOMDocument of PHP.
Whatever be the best way, I need to add some attributes like mentioned above in order to harden all external links on website.

First of all, the guidelines you mentioned advised against parsing HTML with regexes. As far as I understand, what you are trying to do is to parse plain text from user and convert it into HTML. For that purpose, regexes are usually just fine.
(Note that I assume you parse the text into links yourself and aren't using external library for that. In the latter case you'd need to fix the HTML the library outputs, and for this you should use DOMDocument to iterate over all <a> tags and add them proper attributes.)
Now, you can parse it in two ways: server side, or client side.
Server side
Pros:
It outputs ready to use HTML.
It doesn't require users to enable Javascript.
Cons:
You need to add rel="nofollow" attribute for the bots to not follow the links.
Client side
Pros:
You don't need to add rel="nofollow" attribute for the bots, since they don't see the links in the first place - they're generated with Javascript and bots usually don't parse Javascript.
Cons:
Creating links that way requires users to enable Javascript.
Implementing stuff like that in Javascript can give the impression that site is slow, especially if there is a lot of text to parse.
It makes caching parsed text difficult.
I'll focus on implementing it server-side.
Server-side implementation
So, in order to parse links from user input and add them any attribute you want, you can use something like this:
<?php
function replaceLinks($text)
{
$regex = '/'
. '(?<!\S)'
. '(((ftp|https?)?:?)\/\/|www\.)'
. '(\S+?)'
. '(?=$|\s|[,]|\.\W|\.$)'
. '/m';
return preg_replace_callback($regex, function($match)
{
return '<a'
. ' target=""'
. ' rel="nofollow"'
. ' href="' . $match[0] . '">'
. $match[0]
. '</a>';
}, $text);
}
Explanation:
(?<!\S): not preceded by non-whitespace characters.
(((ftp|https?)?:?)\/\/|www\.): accept ftp://, http://, https://, ://, // and www. as beginning of URLs.
(\S+?) match everything that is not whitespace in non-greedy fashion.
(?=$|\s|[,]|\.\W|\.$) every URL must be follow by either end of line, a whitespace, a comma, a dot followed by character other than \w (this is to allow .com, .co.jp etc to match) or by a dot followed by end of line.
m flag - match multiline text.
Testing
Now, to support my claim that it works I added a few test cases:
$tests = [];
$tests []= ['http://example.com', '<a target="" rel="nofollow" href="http://example.com">http://example.com</a>'];
$tests []= ['https://example.com', '<a target="" rel="nofollow" href="https://example.com">https://example.com</a>'];
$tests []= ['ftp://example.com', '<a target="" rel="nofollow" href="ftp://example.com">ftp://example.com</a>'];
$tests []= ['://example.com', '<a target="" rel="nofollow" href="://example.com">://example.com</a>'];
$tests []= ['//example.com', '<a target="" rel="nofollow" href="//example.com">//example.com</a>'];
$tests []= ['www.example.com', '<a target="" rel="nofollow" href="www.example.com">www.example.com</a>'];
$tests []= ['user#www.example.com', 'user#www.example.com'];
$tests []= ['testhttp://example.com', 'testhttp://example.com'];
$tests []= ['example.com', 'example.com'];
$tests []= [
'test http://example.com',
'test <a target="" rel="nofollow" href="http://example.com">http://example.com</a>'];
$tests []= [
'multiline' . PHP_EOL . 'blah http://example.com' . PHP_EOL . 'test',
'multiline' . PHP_EOL . 'blah <a target="" rel="nofollow" href="http://example.com">http://example.com</a>' . PHP_EOL . 'test'];
$tests []= [
'text //example.com/slashes.php?parameters#fragment, some other text',
'text <a target="" rel="nofollow" href="//example.com/slashes.php?parameters#fragment">//example.com/slashes.php?parameters#fragment</a>, some other text'];
$tests []= [
'text //example.com. new sentence',
'text <a target="" rel="nofollow" href="//example.com">//example.com</a>. new sentence'];
Each test case is composed of two parts: source input and expected output. I used following code to determine whether the function passes the tests above:
foreach ($tests as $test)
{
list ($source, $expected) = $test;
$actual = replaceLinks($source);
if ($actual != $expected)
{
echo 'Test ' . $source . ' failed.' . PHP_EOL;
echo 'Expected: ' . $expected . PHP_EOL;
echo 'Actual: ' . $actual . PHP_EOL;
die;
}
}
echo 'All tests passed' . PHP_EOL;
I think this gives you idea how to solve the problem. Feel free to add more tests and experiment with regex itself to make it suitable for your specific needs.

You might be interested in Goutte
you can define your own filters etc.

Get the content to post using jquery and process it before posting it to PHP.
$('#idof_content').val(
$('#idof_content').val().replace(/\b(http(s|):\/\/|)(www\.\S+)/ig,
"<a href='http\$2://\$3' target='_blank' rel='nofollow'>\$3</a>"));

PHP regex split text to insert HTML

Very(!) new to regex but...
I have the following text strings outputted from a $title variable:
A. This is a title
B. This is another title
etc...
I'm after the following:
<span>A.</span> This is a title
<span>B.</span> This is another title
etc...
Currently I have the following code:
$title = $element['#title'];
if (preg_match("([A-Z][\.])", $title)) {
return '<li' . drupal_attributes($element['#attributes']) . ">Blarg</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
This replaces anything A. through to Z. with Blarg however I'm not sure how to progress this?
In the Text Wrangler app I could wrap regex in brackets and output each argument like so:
argument 1 = \1
argument 2 = \2
etc...
I know I need to add an additional regex to grab the remainder of the text string.
Perhaps a regex guru could help and novice out!
Thanks,
Steve

Try
$title = 'A. This is a title';
$title = preg_replace('/^[A-Z]\./', '<span>$0</span>', $title);
echo $title;
// <span>A.</span> This is a title
If the string contains newlines and other titles following them, add the m modifier after the ending delimiter.
If the regex doesn't match then no replacements will be made, so there is no need for the if statement.

Is it always just 2 char ("A.", "B.", "C.",...)
because then you could work with a substring instead of regex.
Just pick of the first 2 chars of the link and wrap the span around the substring

Try this (untested):
$title = $element['#title'];
if (preg_match("/([A-Z]\.)(.*)/", $title, $matches)) {
return '<li' . drupal_attributes($element['#attributes']) . "><span>{$matches[0]</span>{$matches[1]}</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
The change here was to first add / to the start and end of the string (to denote it's a regex), then remove the [ and ] around the period . because that's just a literal character on its own, then to add another grouping which will match the rest of the string. I also Added a $matches to preg_match() to place these two matches in to to use later, which we do on the next life.
Note: You could also do this instead:
$title = preg_replace('/^([A-Z]\.)/', "<span>$1</span>", $title);
This will simply replace the A-Z followed by the period at the start of the string (denoted with the ^ character) with <span>, that character (grabbed with the brackets) and </span>.
Again, that's not tested, but should give you a headstart :)

Split variable content into multiple paragraphs

Hey there,
I have this little php code:
<p class="category_text"><? echo $category_text; ?></p>
I waht to split the $category_text and get something like this:
This is sentence 1 of category_text
This is sentence 2 of category_text
and so on...
$category_text has about 300 words and lets say 6 sentences. How could I split the text in multiple paragraphs (delimited by the stop sings ".")
Thank you very much!

echo '<p class="category_text">'
. implode('</p><p class="category_text">', explode('.',$string))
.'</p>';

You can just replace the "." by the tag "":
<p class="category_text"><? echo str_replace('.', '.<br />', $category_text); ?></p>
It's not a perfect solution! But if you text is simple enough this little trick should work.
For example if you have a line with 3 dots:
$category_text = "Ok...";
It will show up like that:
OK.
.
.
Also if your sentences finish by "?" or "!" you can also use that:
<p class="category_text"><? echo str_replace(array('.', '!', '?'), array('.<br />', '!<br />', '?<br />'), $category_text); ?></p>
PS: My solution will create one paragraph "" but with multiple line break

Try creating an array, and then output the lines one by one. A sentence ending in ... would still be recognized as still ends in ". ".
$sentences = explode('. ', $category_text)
foreach($sentences as $val)
{
echo $val . ".<br /><br />";
}

You want to split a text into sentences, which is not trivial - using explode(".", $string) does often not give good results.
Search Stackoverflow for "php split sentence", or directly try the solution to PHP: Parse document / text into sentences :
http://www.zubrag.com/scripts/text-splitter.php
Once you have an array with sentences, use
echo '<p>' . implode('</p><p>', $sentences) . '</p>';
to echo them out.

Scan HTML for values with a special character before them

Say I have values on my page, like #100 #246, What I want to do is scan the page for values with a # before them and then alter them to put a hyperlink on it
$MooringNumbers = '#!' . $MooringNumbers . ' | ' . '#!' . $row1["Number"];
}
$viewedResult = '<tr><td>' .$Surname.'</td><td>'.$Title.'</td><td>'.$MooringNumbers . '</td><td>'.$Telephone.'</td><td>' . '[EDIT]</td>'.'<td>'. '[x]</td>'. '</tr>'; preg_replace('/#!(\d\d\d)/', '${1}', $viewedResult);
echo $viewedResult;
This is the broken code which doesnt work.

I second Xoc - use PHP manual. The method next to the one he pointed is preg-replace-callback
Just call:
preg_replace_callback(
'/#\d\d\d/',
create_function(
// single quotes are essential here,
// or alternative escape all $ as \$
'$matches',
'return strtolower($matches[0]);' //this you replace with what you want to fetch from database
)
EDIT:
Since you want to always perform the same replacement go with Xoc's preg-replace:
preg_replace('/#!(\d\d\d)/', '${1}', $your_input);
Note: I don't have PHP here, so I give no guarantee of this code not wiping your entire hard disk ;)

You can accomplish this by using regular expressions, see PHP's preg_replace function.
$text = 'Lorem ipsum #300 dolar amet #20';
preg_match_all('/(^|\s)#(\w+)/', $text, $matches);
// Perform you database magic here for each element in $matches[2]
var_dump($matches[2]);
// Fake query result
$query_result = array ( 300 => 'http://www.example1.com', 20 => 'http://www.example2.com');
foreach($query_result as $result_key => $result_value)
{
$text = str_replace('#'.$result_key, ''. $result_value . '', $text);
}
var_dump($text);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_replace_callback() memory issue - php

Related

Preg Relplace only replace instances of text with a trailing space

PHP - find all hyperlinks in a post, add target and rel=nofollow attribute

PHP regex split text to insert HTML

Split variable content into multiple paragraphs

Scan HTML for values with a special character before them

Categories

Resources