Text wrap nightmare in PHP after PREG_REPLACE - php

Provinces is a group_concat of all the individual records that contain province, some of which are blank.
So, when I encode:
$provinces = ($row['provinces']);
echo "<td>".wordwrap($provinces, 35, "<br />")."</td>";
This is what the result looks like:
Minas Gerais,,,Rio Grande do
Sul,Santa Catarina,Paraná,São Paulo
However, when I try to preg_replace out some of the nulls, and add some spaces with this expression:
$provinces = preg_replace($patterns,
$replaces, ($row['provinces']));
echo "<td>".wordwrap($provinces, 35, "<br />")."</td>";`
This is what I get!!! :(
Minas Gerais, Rio Grande do
Sul, Santa
Catarina, Paraná, São Paulo
The output is very unnatural looking.
BTW: Here are the search and replace arrays:
$patterns[0] = '/,,([,]+)?/'; $replaces[0] = ', ';
$patterns[1] = '/^,/'; $replaces[1] = '';
$patterns[2] = '/,$/'; $replaces[2] = '';
$patterns[3] = '/\b,\b/'; $replaces[3] = ', ';
$patterns[4] = '/\s,/'; $replaces[4] = ', ';
UPDATE: I even tried to change Paraná to Parana
Minas Gerais, Rio Grande do
Sul, Santa
Catarina, Parana, São
Paulo

Don't use as the replacement. wordwrap() considers that 6 characters. It doesn't interpret the HTML entity. That's why your lines are breaking funny. If you want replace spaces after you wordwrap()
Also, your first pattern should be:
// match one or more commas together
$patterns[0] = '/,+/';

Is the wordwrap() really necessary? It sounds like you are rendering this content into a table cell of some fixed width and you don't want individual entries to split across lines.
If this inference is correct - and if none of your entries is actually so long that forcing it to a single line will break your layout - then how about this: explode() on commas into an array, remove the whitespace-only entries, replace normal spaces in each array entry with , and implode() back on , (a comma followed by a space). Then let the rendering browser break lines wherever it needs.

Related

Str_replace alternative for following strings in PHP

In my script I replaced all "," commas with quotation+spaces.
But when it comes to numbers which are like 3,456,778, it also converts the commas to quote+space. Is there any way to add to command to ignore big numbers like that so it doesn't convert it to:
3" 456" 778"
If there is quotationm+space+any number then convert quotation+space to comma.. I mean i know how to do it with str_replace command but i dont know how to select anynumber 0-9.
Any help to do it? To convert it to:
3,456,778
I think i need to elaborate some. I needed convert this text:
Value=3,456,778,id=777
To:
Value=3,456,778" id=777"
But problem is it also convert those middle commas in between numbers.
So even if I can change my str_replace command to this like
"If comma is not in between two numbers then only convert comma to quotation+space". It would be good. Is it possible?
What about this?
preg_replace("/,([^0-9]|$)/", "\"$1", $text);
This will match all the text except commas followed by numbers.
For instance, this:
$text = "123,23 adas , asdsa d, asdasd sa 1234,234324,asdas 324324 234,";
echo $text; echo "<br/>";
echo preg_replace("/,([^0-9]|$)/", "\"$1", $text);
Will echo this:
123,23 adas , asdsa d, asdasd sa 1234,234324,asdas 324324 234"
123,23 adas " asdsa d" asdasd sa 1234,234324"asdas 324324 234"
It is not really clear from your description what you actually want to do.
This might be a step into the right direction, however:
preg_replace('/([0-9]+)" /', '\\1,', '3" 456" 778"');
Not the best solution maybe,but can give it a try.
$copy_date = '3" 456" 778"';
$copy_date = preg_replace("(\"\s{1})", ",", $copy_date);
$copy_date1 = preg_replace("(\")", "", $copy_date);
print $copy_date1;
o/p:3,456,778

remove HTML from displaying in PHP

I have this text : http://pastebin.com/2Zgbs7hi
And i want to be able to remove the HTML code from it and just display the plain text but i want to keep at least one line break where there are currently a few line breaks
i have tried:
$ticket["summary"] = 'pastebin example';
$TicketSummaryDisplay = nl2br($ticket["summary"]);
$TicketSummaryDisplay = stripslashes($TicketSummaryDisplay);
$TicketSummaryDisplay = trim(strip_tags($TicketSummaryDisplay));
$TicketSummaryDisplay = preg_replace('/\n\s+$/m', '', $TicketSummaryDisplay);
echo $TicketSummaryDisplay;
that is displaying as plain text, but it shows it all as one big block of text with no line breaks at all
Maybe this will earn you some time.
<?php
libxml_use_internal_errors(true); //crazy o tags
$html = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$dom = new DOMDocument;
$dom->loadHTML($html);
$result='';
foreach ($dom->getElementsByTagName('p') as $node) {
if (strstr($node->nodeValue, 'Legal Disclaimer:')){
break;
}
$result .= $node->nodeValue;
}
echo $result;
This example should successfully store text from html into an array of strings.
After stripping all the tags, you can use preg_split with \R special character ( matches any newline sequence ) to convert string into array. That array will now have several blank values, and there will be also some amount of html non-breaking space entities, so we will check the array for empty values with array_filter() function ( it will remove all items that do not satisfy the filter conditions, in our case, an empty value ). Here are a problem with entity, because and space characters are not the same, they have different ASCII code, so trim() function will not remove spaces. Here are two possible solutions, the first uncommented part will only replace &nbsp and check for white space characters, while the second commented one will decode all html entities and also check for spaces.
PHP:
$text = file_get_contents( 'http://pastebin.com/raw.php?i=2Zgbs7hi' );
$text = strip_tags( $text );
$array = array_filter(
preg_split( '/\R/', $text ),
function( &$item ) {
$item = str_replace( ' ', ' ', $item );
return trim( $item );
// $item = html_entity_decode( $item );
// return trim( str_replace( "\xC2\xA0", ' ', $item ) );
}
);
foreach( $array as $value ) {
echo $value . '<br />';
}
Array output:
Array
(
[8] => Hi,
[11] => Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
[13] => Regards
[23] => Legal Disclaimer:
[24] => This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
[25] => Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
)
Now you should have clear array with only items with value in it. By the way, newlines in HTML are expressed through <br />, not through \n, your example as response in a web browser still has them, but they are only visible in page source code. I hope I did not missed the point of the question.
try this get text output with line brakes
<?php
$ticket["summary"] = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$TicketSummaryDisplay = nl2br($ticket["summary"]);
echo strip_tags($TicketSummaryDisplay,'<br>');
?>
You are asking on how to add line-breaks to your "one big block of text with no line breaks at all".
Short answer
After you stripped the HTML tags, apply wordwrap with a desired text-block length
$text = wordwrap($text, 90, "<br />\n");
I really wonder, why nobody suggested that function before.
there is also chunk_split around, which doesn't take words into account and just splits after a certain number of chars. breaking words - but that's not what you want, i guess.
PHP
<?php
$text = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
/**
* Returns string without html tags, also
* removes takes control chars, spaces and " " into account.
*/
function dropHtmlTags($string) {
// remove html tags
//$string = preg_replace ('/<[^>]*>/', ' ', $string);
$string = strip_tags($string);
// control characters and "&nbsp"
$string = str_replace("\r", '', $string); // remove
$string = str_replace("\n", ' ', $string); // replace with space
$string = str_replace("\t", ' ', $string); // replace with space
$string = str_replace(" ", ' ', $string);
// remove multiple spaces
$string = preg_replace('/ {2,}/', ' ', $string);
$string = trim($string);
return $string;
}
$text = dropHtmlTags($text);
// The Answer: insert line breaks after 95 chars,
// to get rid of the "one big block of text with no line breaks at all"
$text = wordwrap($text, 95, "<br />\n");
// if you want to insert line-breaks before the legal disclaimer,
// uncomment the next line
//$text = str_replace("Regards Legal Disclaimer", "<br /><br />Regards Legal Disclaimer", $text);
echo $text;
?>
Result
first section shows your text block
second section shows the text with wordwrap applied (code from above)
Hello it can be done as follows:
$abc= file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$abc = strip_tags("\n", $abc);
echo $abc;
Please, let me know whether it works
you may use
<?php
$a= file_get_contents('a.txt');
echo nl2br(htmlspecialchars($a));
?>
<?php
$handle = #fopen("pastebin.html", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgetss($handle, 4096);
echo $buffer;
}
fclose($handle);
}
?>
output is
Hi,
Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
Regards
Legal Disclaimer:
This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
You can probably write additional code to convert to spaces etc.
I'm not sure I did understand everything correctly but this seems to be your expected result:
$txt = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
var_dump(preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", trim(strip_tags(preg_replace("/(\s){1,}/", " ", $txt)))));
//more readable
$txt = preg_replace("/(\s){1,}/", " ", $txt);
$txt = trim(strip_tags($txt));
$txt = preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", $txt);
The strip_tags() function strips HTML and PHP tags from a string, if that is what you are trying to accomplish.
Examples from the docs:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> Other text

replace multiple spaces in a string with a single space

I have a file with has several spaces among the words at some point. I need to clean the file and replace existing multi-spaced sequences with one space only. I have written the following statement which does not work at all, and it seems I'm making a big mistake.
$s = preg_replace("/( *)/", " ", $x);
My file is very simple. Here is a part of it:
Hjhajhashsh dwddd dddd sss ddd wdd ddcdsefe xsddd scdc yyy5ty ewewdwdewde wwwe ddr3r dce eggrg vgrg fbjb nnn bh jfvddffv mnmb weer ffer3ef f4r4 34t4 rt4t4t 4t4t4t4t ffrr rrr ww w w ee3e iioi hj hmm mmmmm mmjm lk ;’’ kjmm ,,,, jjj hhh lmmmlm m mmmm lklmm jlmm m
Your regex replaces any number of spaces (including zero) with a space. You should only replace two or more (after all, replacing a single space with itself is pointless):
$s = preg_replace("/ {2,}/", " ", $x);
What I usually do to clean up multiple spaces is:
while (strpos($x, ' ') !== false) {
$x = str_replace(' ', ' ', $x);
}
Conditions/hypotheses:
strings with multiple spaces are rare
two spaces are by far more common than three or more
preg_replace is expensive in terms of CPU
copying characters to a new string should be avoided when possible
Of course, if condition #1 is not met, this approach does not make sense, but it usually is.
If #1 is met, but any of the others is not (this may depend on the data, the software (PHP version) or even the hardware), then the following may be faster:
if (strpos($x, ' ') !== false) {
$x = preg_replace('/ +/', ' ', $x); // i.e.: '/␣␣+/'
}
Anyway, if multiple spaces appear only in, say, 2% of your strings, the important thing is the preventive check with strpos, and you probably don't care much about optimizing the remaining 2% of cases.
// Your input
$str = "Hjhajhashsh dwddd dddd sss ddd wdd ddcdsefe xsddd scdc yyy5ty ewewdwdewde wwwe ddr3r dce eggrg vgrg fbjb nnn bh jfvddffv mnmb weer ffer3ef f4r4 34t4 rt4t4t 4t4t4t4t ffrr rrr ww w w ee3e iioi hj hmm mmmmm mmjm lk ;’’ kjmm ,,,, jjj hhh lmmmlm m mmmm lklmm jlmm m";
echo $str.'<br>';
$output = preg_replace('!\s+!', ' ', $str); // Replace multispace with sigle.
echo $output;

Find a pattern within two or more sets of text

I have lots of data that I need to search through for certain patterns.
Problem is when looking for said patterns I have no reference to what I'm looking for.
Or in other words, I have two paragraphs. Each on similar topics. I need to be able to compare both paragraphs and find patterns. Phrases said in both paragraphs and how many times both were said.
Can't seem to find the solution because preg_match and other functions your required to supply the things your looking for.
Example paragraphs
Paragraph 1:
Bee Pollen is made by honeybees, and is the food of the young bee. It
is considered one of nature's most completely nourishing foods as it
contains nearly all nutrients required by humans. Bee-gathered pollens
are rich in proteins (approximately 40% protein), free amino acids,
vitamins, including B-complex, and folic acid.
Paragraph 2:
Bee Pollen is made by honeybees. It is required for the fertilization
of the plant. The tiny particles consist of 50/1,000-millimeter
corpuscles, formed at the free end of the stamen in the heart of the
blossom, nature's most completely nourishing foods. Every variety of
flower in the universe puts forth a dusting of pollen. Many orchard
fruits and agricultural food crops do, too.
So from those examples these patterns:
Bee Pollen is made by honeybees
and:
nature's most completely nourishing foods
Both phrases are found in both paragraphs.
This is potentially a complex question depending on whether you're looking for similar phrases or phrases that match word for word.
Finding exact word-for-word matches is quite simple all you need to do is split on common breaks like punctuation marks (e.g. .,;:) and perhaps on conjunctions as well (e.g. and or). However, the problem comes when you come to, for example, adjectives two phrases might be exactly the same but have one word different, like so:
The world is spinnnig around its axis at a tremendous speed.
The world is spinning around its axis at a magnificent speed.
This won't match because tremendous and magnificent are used in place of one another. Potentially you could work around this, however, that would be a more complex question.
Answer
If we stick to the simple side of things we can achieve phrase matching with just a few lines of code (4 in this example; not including the formatting for comments/readability).
$wordSplits = 'and or on of as'; //List of words to split on
preg_match_all('/(?<m1>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para1, $matches1);
preg_match_all('/(?<m2>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para2, $matches2);
$commonPhrases = array_filter( //Removes blank $key=>$value pairs
array_intersect( //Finds matching paterns
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para1 values - removes leading and following spaces
}, $matches1['m1']),
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para2 values - removes leading and following spaces
}, $matches2['m2'])
)
);
var_dump($commonPhrases);
/**
OUTPUT:
array(2) {
[0]=>
string(31) "bee pollen is made by honeybees"
[5]=>
string(41) "nature's most completely nourishing foods"
}
/*
The above code will find matches splitting both on punctuation (defined in [...] of the preg_match_all pattern) it will also concatenate the word list (matching only words in the word list with a preceding and following space).
Wordlist
You can change the word list to include any breaks you like, editing the list until you get the phrases you are after, examples:
$wordSplits = 'and or';
$wordSplits = 'and but if or';
$wordSplits = 'a an as and by but because if in is it of off on or';
Punctuation
You can add any punctuation marks you like into the list (between [ and ]), however remember that some characters do have special meanings and may need to be escaped (or placed appropriately): - and ^ should become \- and \^ or be placed where their special meaning doesn't come into play.
You may consider changing:
([.,;:\-]|
To:
([.,;:\-] | //Adding a space before the pipe
So that you only split punctuation marks which are followed by a space. For example: this would mean that items like 50,000 won't be split.
Spaces and breaks
You may also consider changing the spaces to \s so that tabs and newlines etc are included and not just spaces. Like so:
'/(?<m1>.*?)([.,;:\-]|\s'.str_replace(' ', '\s|\s', trim($wordSplits)).'\s)/i'
This would also apply to:
([.,;:\-]\s|
If you decide to go down that route.
I've been working on this code, don't know if it suits your needs... Feel free to expand it!
$p1 = "Bee Pollen is made by honeybees, and is the food of the young bee. It is considered one of nature's most completely nourishing foods as it contains nearly all nutrients required by humans. Bee-gathered pollens are rich in proteins (approximately 40% protein), free amino acids, vitamins, including B-complex, and folic acid.";
$p2 = "Bee Pollen is made by honeybees. It is required for the fertilization of the plant. The tiny particles consist of 50/1,000-millimeter corpuscles, formed at the free end of the stamen in the heart of the blossom, nature's most completely nourishing foods. Every variety of flower in the universe puts forth a dusting of pollen. Many orchard fruits and agricultural food crops do, too.";
// Strip strings of periods etc.
$p1 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p1));
$p2 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p2));
// Extract words from first paragraph
$w1 = explode(" ", $p1);
// Build search string
$search = '';
$found = array();
foreach ($w1 as $word) {
//echo 'Word: ' . $word . "<br />";
$search .= ' ' . $word;
$search = trim($search);
//echo '. . Search string: '. $search . "<br /><br />";
if (substr_count($p2, $search)) {
$old_search = $search;
$num_occured = substr_count($p2, $search);
//echo " . . . found!" . "<br /><br /><br />";
$add = TRUE;
} else {
//echo " . . . not found! Generating new search string: " . $word . '<br />';
if ($add) {
$found[] = array('pattern' => $old_search, 'occurences' => $num_occured);
$add = FALSE;
}
$old_search = '';
$search = $word;
}
}
print_r($found);
The above code finds occurences of patterns from the first string in the second one.
I'm sure it can be written better, but since it's past midnight (local time), I'm not as "fresh" as I'd like to be...
Codepad-link

nl2br for paragraphs

$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
How do I convert each new line to paragraph?
$variable should become:
<p>Afrikaans</p>
<p>Shqip - Albanian</p>
<p>Euskara - Basque</p>
Try this:
$variable = str_replace("\n", "</p>\n<p>", '<p>'.$variable.'</p>');
The following should do the trick :
$variable = '<p>' . str_replace("\n", "</p><p>", $variable) . '</p>';
Be careful, with the other proposals, some line breaks are not catch.
This function works on Windows, Linux or MacOS :
function nl2p($txt){
return str_replace(["\r\n", "\n\r", "\n", "\r"], '</p><p>', '<p>' . $txt . '</p>');
}
$array = explode("\n", $variable);
$newVariable = '<p>'.implode('</p><p>', $array).'</p>'
<?php
$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
$prep0 = str_replace(array("\r\n" , "\n\r") , "\n" , $variable);
$prep1 = str_replace("\r" , "\n" , $prep0);
$prep2 = preg_replace(array('/\n\s+/' , '/\s+\n/') , "\n" , trim($prep1));
$result = '<p>'.str_replace("\n", "</p>\n<p>", $prep2).'</p>';
echo $result;
/*
<p>Afrikaans</p>
<p>Shqip - Albanian</p>
<p>Euskara - Basque</p>
*/
?>
Explanation:
$prep0 and $prep1: Make sure each line ends with \n.
$prep2: Remove redundant whitespace. Keep linebreaks.
$result: Add p tags.
If you don't include $prep0, $prep1 and $prep2, $result will look like this:
<p>Afrikaans
</p>
<p>Shqip - Albanian
</p>
<p>Euskara - Basque</p>
Not very nice, I think.
Also, don't use preg_replace unless you have to. In most cases, str_replace is faster (at least according to my experience). See the comments below for more information.
Try:
$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
$result = preg_replace("/\r\n/", "<p>$1</p>", $variable);
echo $result;
I know this is a very old thread, but I want to highlight, that suggested solutions can have some issues in HTML world:
They do not check whether there is already a p tag around respective paragraph. This can result in extra paragraphs. At least some browsers will then show this as extra paragraphs, meaning <p>line1<p>line2</p>line3</p> will result in 3 paragraphs, which may not be the intention.
In fact, there is a bunch of tags, that are not expected inside of p, as per the spec of phrasing content. Or rather there is a limited set of tags, tha can be.
They will change new lines inside tags, where you want to preserve new lines as is. pre and textarea are the ones, where you could generally want that. code, samp, kbd and var are an example of other common values, but technically it can be any tag with white-space CSS property set to either pre, pre-wrap, pre-line or break-spaces.
They usually only check for \r\n or just \r or \n, while there are actually more symbols, that would mean new line, and they also have respective HTML entities, which can easily occur in HTML string.
To "combat" these flaws, at least, to an extent, I've just released a nl2tag library, which can also "convert" new lines to <li> items and has an "improved" nl2br logic (mostly for the sake of whitespace retention).
It's far from perfect (check the readme for limitations), but should cover you in case of relatively simple HTML string.

Categories