Sanitize sentence in php - php

The title may sound odd, but im kind of trying to set up this preg_replace that takes care of messy writers for a textarea. It has to:
if there is an exclamation sign, there should not be another one in a row.
if there is a ., the comma wins and it has to be ,
when there is one+ spaces before a coma, it should be reduced to nothing.
the sentence cannot start or end with a comma.
there should never be more than 2 of the same letters joined together.
a space must be always present after a comma.
E.g.:
,My house, which is green., is nice!
My house..., which is green, is nice!!!
My house ,which is green,,, is nice!!
The end result should always be:
My house, which is green, is nice!
Is there an already built regex that takes care of this?
Solution check out FakeRainBrigand's solution below!

I might have to use this for my own sites... nice idea!
<?php
$text = 'My hooouse..., which is greeeeeen , is nice!!! ,And pretty too...';
$pats = array(
'/([.!?]\s{2}),/', # Abc. ,Def
'/\.+(,)/', # ......,
'/(!)!+/', # abc!!!!!!!!
'/\s+(,)/', # abc , def
'/([a-zA-Z])\1\1/', # greeeeeeen
'/,(?!\s)/');
$fixed = preg_replace($pats, '$1', $text);
echo $fixed;
echo "\n\n";
?>
And the 'modified' version of $text: "My house, which is green, is nice! And pretty too."
UPDATE: Here's the version that handles "abc,def" -> "abc, def".
<?php
$text = 'My hooouse..., which is greeeeeen ,is nice!!! ,And pretty too...';
$pats = array(
'/([.!?]\s{2}),/', # Abc. ,Def
'/\.+(,)/', # ......,
'/(!)!+/', # abc!!!!!!!!
'/\s+(,)/', # abc , def
'/([a-zA-Z])\1\1/'); # greeeeeeen
$fixed = preg_replace($pats, '$1', $text);
$really_fixed = preg_replace('/,(?!\s)/', ', ', $fixed);
echo $really_fixed;
echo "\n\n";
?>
I would think this is a bit slower since it's an additional function call.

- $result = preg_replace('/!+/', '!', $subject);
- $result = preg_replace('/\.*,/', ',', $subject);
- $result = preg_replace('/\s+(?=,)/', '', $subject);
- $result = preg_replace('/^,*|,*$/', '', $subject);
- $result = preg_replace('/([a-z])\1+/i', '$1$1', $subject);
- $result = preg_replace('/,(?!\s)/', ', ', $subject);
One by one matching to your rules :)

Related

Replace one line with another with Laravel or PHP

So I have text file that content looks, for example, as folows:
Name: 'John',
Surname: 'Doe',
Age: 35
But I don't know for sure if the current surname is Doe, anything could be generated there.
And I need to replace this surname with another one. So I need to somehow open the file, find the place I need (I know for sure that it starts with Surname: ' and ends with ',, and I need to replace string between these two substrings, whatever it was before, without breaking the file structure (losing line breaks and so on; the actual file is pretty long, so adding \n manually is not an option).
So far I've tried this
$content = file_get_contents('text.txt');
$search = "/[^Surname: '](.*)[^',]/";
$replace = 'Smith';
$content = preg_replace($search,$replace,$content);
file_put_contents('text.txt', $content);
But it replaces almost everything with 'Smith', because the combination of ', is pretty common in this file, and also it turns the entire file into one line.
So what could I do to solve my problem? Would highly appreciate any possible help!
UPD: str_replace could be what I need, but first then I need to retrieve the whole line Surname: 'Doe', from the file to get the current surname.
I would use the regex /^Surname: '.*',$/m based on your description and replace it with Surname: 'Smith',.
Code:
<?php
$content = file_get_contents('text.txt');
$search = "/^Surname: '.*',$/m";
$replace = "Surname: 'Smith',";
$content = preg_replace($search, $replace, $content);
file_put_contents('text.txt', $content);
Demo:
$ cat text.txt
Name: 'John',
Surname: 'Doe',
Age: 35
$ php a.php
$ cat text.txt
Name: 'John',
Surname: 'Smith',
Age: 35
This regex should help:
"/Surname: '(.*)'/"
$text = "Name: 'John',
Surname: 'Doe',
Age: 35";
$search = "/Surname:\s+'(.*?)',/is";
$replace = 'Surname: \'Smith\',';
$content = preg_replace($search, $replace, $text);
echo $content;

Replace url strings in PHP

I have a string for example : I am a boy
I want to show this on my url for example in this way : index.php?string=I-am-a-boy
My program :
$title = "I am a boy";
$number_wrds = str_word_count($title);
if($number_wrds > 1){
$url = str_replace(' ','-',$title);
}else{
$url = $title;
}
What if I have a string : Destination - Silicon Valley
If I implement the same logic my url will be : index.php?string=Destination---Silicon-Valley
But I want to show only 1 hyphen.
I want to show a hyphen instead of a plus sign..
url_encode() will eventually insert plus symbols.. So it's not helping here.
Now if I use minus symbol then if the actual string is Destination - Silicon Valley, then the url will look like
Destination-Silicon-Valley and not
Destination---Silicon-Valley
Check this stackoverflow question title and the url. You will know what I am saying.
Check this
Use urlencode() to send strings along with an url:
$url = 'http://your.server.com/?string=' . urlencode($string);
In comments you told, that you don't want urlencode, you'll just replace spaces by - characters.
First, you should "just do it", the if conditional and str_word_count() is just overhead. Basically your example should look like this:
$title = "I am a boy";
$url = str_replace(' ','-', $title);
That's it.
Further you told that this would make problems if the original string already contains a -. I would use preg_replace() instead of str_replace() to solve that problem. Like this:
$string = 'Destination - Silicon Valley';
// replace spaces by hyphen and
// group multiple hyphens into a single one
$string = preg_replace('/[ -]+/', '-', $string);
echo $string; // Destination-Silicon-Valley
Use preg_replace instead:
$url = preg_replace('/\s+/', '-', $title);
\s+ means "any whitespace character (\t\r\n\f (space, tab, line feed, newline)).
use urlencode:
<?php
$s = "i am a boy";
echo urlencode($s);
$s = "Destination - Silicon Valley";
echo urlencode($s);
?>
return:
i+am+a+boy
Destination+-+Silicon+Valley
and urldecode:
<?php
$s = "i+am+a+boy";
echo urldecode($s)."\n";
$s = "Destination+-+Silicon Valley";
echo urldecode($s);
?>
return:
i am a boy
Destination - Silicon Valley
just use urlencode() and urldecode(). It’s for sending Data with GET in the URL.

remove HTML from displaying in PHP

I have this text : http://pastebin.com/2Zgbs7hi
And i want to be able to remove the HTML code from it and just display the plain text but i want to keep at least one line break where there are currently a few line breaks
i have tried:
$ticket["summary"] = 'pastebin example';
$TicketSummaryDisplay = nl2br($ticket["summary"]);
$TicketSummaryDisplay = stripslashes($TicketSummaryDisplay);
$TicketSummaryDisplay = trim(strip_tags($TicketSummaryDisplay));
$TicketSummaryDisplay = preg_replace('/\n\s+$/m', '', $TicketSummaryDisplay);
echo $TicketSummaryDisplay;
that is displaying as plain text, but it shows it all as one big block of text with no line breaks at all
Maybe this will earn you some time.
<?php
libxml_use_internal_errors(true); //crazy o tags
$html = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$dom = new DOMDocument;
$dom->loadHTML($html);
$result='';
foreach ($dom->getElementsByTagName('p') as $node) {
if (strstr($node->nodeValue, 'Legal Disclaimer:')){
break;
}
$result .= $node->nodeValue;
}
echo $result;
This example should successfully store text from html into an array of strings.
After stripping all the tags, you can use preg_split with \R special character ( matches any newline sequence ) to convert string into array. That array will now have several blank values, and there will be also some amount of html non-breaking space entities, so we will check the array for empty values with array_filter() function ( it will remove all items that do not satisfy the filter conditions, in our case, an empty value ). Here are a problem with entity, because and space characters are not the same, they have different ASCII code, so trim() function will not remove spaces. Here are two possible solutions, the first uncommented part will only replace &nbsp and check for white space characters, while the second commented one will decode all html entities and also check for spaces.
PHP:
$text = file_get_contents( 'http://pastebin.com/raw.php?i=2Zgbs7hi' );
$text = strip_tags( $text );
$array = array_filter(
preg_split( '/\R/', $text ),
function( &$item ) {
$item = str_replace( ' ', ' ', $item );
return trim( $item );
// $item = html_entity_decode( $item );
// return trim( str_replace( "\xC2\xA0", ' ', $item ) );
}
);
foreach( $array as $value ) {
echo $value . '<br />';
}
Array output:
Array
(
[8] => Hi,
[11] => Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
[13] => Regards
[23] => Legal Disclaimer:
[24] => This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
[25] => Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
)
Now you should have clear array with only items with value in it. By the way, newlines in HTML are expressed through <br />, not through \n, your example as response in a web browser still has them, but they are only visible in page source code. I hope I did not missed the point of the question.
try this get text output with line brakes
<?php
$ticket["summary"] = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$TicketSummaryDisplay = nl2br($ticket["summary"]);
echo strip_tags($TicketSummaryDisplay,'<br>');
?>
You are asking on how to add line-breaks to your "one big block of text with no line breaks at all".
Short answer
After you stripped the HTML tags, apply wordwrap with a desired text-block length
$text = wordwrap($text, 90, "<br />\n");
I really wonder, why nobody suggested that function before.
there is also chunk_split around, which doesn't take words into account and just splits after a certain number of chars. breaking words - but that's not what you want, i guess.
PHP
<?php
$text = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
/**
* Returns string without html tags, also
* removes takes control chars, spaces and " " into account.
*/
function dropHtmlTags($string) {
// remove html tags
//$string = preg_replace ('/<[^>]*>/', ' ', $string);
$string = strip_tags($string);
// control characters and "&nbsp"
$string = str_replace("\r", '', $string); // remove
$string = str_replace("\n", ' ', $string); // replace with space
$string = str_replace("\t", ' ', $string); // replace with space
$string = str_replace(" ", ' ', $string);
// remove multiple spaces
$string = preg_replace('/ {2,}/', ' ', $string);
$string = trim($string);
return $string;
}
$text = dropHtmlTags($text);
// The Answer: insert line breaks after 95 chars,
// to get rid of the "one big block of text with no line breaks at all"
$text = wordwrap($text, 95, "<br />\n");
// if you want to insert line-breaks before the legal disclaimer,
// uncomment the next line
//$text = str_replace("Regards Legal Disclaimer", "<br /><br />Regards Legal Disclaimer", $text);
echo $text;
?>
Result
first section shows your text block
second section shows the text with wordwrap applied (code from above)
Hello it can be done as follows:
$abc= file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$abc = strip_tags("\n", $abc);
echo $abc;
Please, let me know whether it works
you may use
<?php
$a= file_get_contents('a.txt');
echo nl2br(htmlspecialchars($a));
?>
<?php
$handle = #fopen("pastebin.html", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgetss($handle, 4096);
echo $buffer;
}
fclose($handle);
}
?>
output is
Hi,
Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
Regards
Legal Disclaimer:
This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
You can probably write additional code to convert to spaces etc.
I'm not sure I did understand everything correctly but this seems to be your expected result:
$txt = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
var_dump(preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", trim(strip_tags(preg_replace("/(\s){1,}/", " ", $txt)))));
//more readable
$txt = preg_replace("/(\s){1,}/", " ", $txt);
$txt = trim(strip_tags($txt));
$txt = preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", $txt);
The strip_tags() function strips HTML and PHP tags from a string, if that is what you are trying to accomplish.
Examples from the docs:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> Other text

Text wrap nightmare in PHP after PREG_REPLACE

Provinces is a group_concat of all the individual records that contain province, some of which are blank.
So, when I encode:
$provinces = ($row['provinces']);
echo "<td>".wordwrap($provinces, 35, "<br />")."</td>";
This is what the result looks like:
Minas Gerais,,,Rio Grande do
Sul,Santa Catarina,Paraná,São Paulo
However, when I try to preg_replace out some of the nulls, and add some spaces with this expression:
$provinces = preg_replace($patterns,
$replaces, ($row['provinces']));
echo "<td>".wordwrap($provinces, 35, "<br />")."</td>";`
This is what I get!!! :(
Minas Gerais, Rio Grande do
Sul, Santa
Catarina, Paraná, São Paulo
The output is very unnatural looking.
BTW: Here are the search and replace arrays:
$patterns[0] = '/,,([,]+)?/'; $replaces[0] = ', ';
$patterns[1] = '/^,/'; $replaces[1] = '';
$patterns[2] = '/,$/'; $replaces[2] = '';
$patterns[3] = '/\b,\b/'; $replaces[3] = ', ';
$patterns[4] = '/\s,/'; $replaces[4] = ', ';
UPDATE: I even tried to change Paraná to Parana
Minas Gerais, Rio Grande do
Sul, Santa
Catarina, Parana, São
Paulo
Don't use as the replacement. wordwrap() considers that 6 characters. It doesn't interpret the HTML entity. That's why your lines are breaking funny. If you want replace spaces after you wordwrap()
Also, your first pattern should be:
// match one or more commas together
$patterns[0] = '/,+/';
Is the wordwrap() really necessary? It sounds like you are rendering this content into a table cell of some fixed width and you don't want individual entries to split across lines.
If this inference is correct - and if none of your entries is actually so long that forcing it to a single line will break your layout - then how about this: explode() on commas into an array, remove the whitespace-only entries, replace normal spaces in each array entry with , and implode() back on , (a comma followed by a space). Then let the rendering browser break lines wherever it needs.

Remove special chars from URL

I have a product database and I am displaying trying to display them as clean URLs, below is example product names:
PAUL MITCHELL FOAMING POMADE (150ml)
American Crew Classic Gents Pomade 85g
Tigi Catwalk Texturizing Pomade 50ml
What I need to do is display like below in the URL structure:
www.example.com/products/paul-mitchell-foaming-gel(150ml)
The problem I have is I want to do the following:
1. Remove anything inside parentheses (and the parentheses)
2. Remove any numbers next to g or ml e.g. 400ml, 10g etc...
I have been banging my head trying different string replaces but cant get it right, I would really appreciate some help.
Cheers
function makeFriendly($string)
{
$string = strtolower(trim($string));
$string = str_replace("'", '', $string);
$string = preg_replace('#[^a-z\-]+#', '_', $string);
$string = preg_replace('#_{2,}#', '_', $string);
$string = preg_replace('#_-_#', '-', $string);
return preg_replace('#(^_+|_+$)#D', '', $string);
}
this function helps you for cleaning url. (also cleans numbers)
try this,
<?php
$url = 'http%3A%2F%2Fdemo.com';
$decodedurl= urldecode($url);
echo $decodedurl;
?
$from = array('/\(|\)/','/\d+ml|\d+g/','/\s+/');
$to = array('','','-');
$sample = 'PAUL MITCHELL FOAMING POMADE (150ml)';
$sample = strtolower(trim(preg_replace($from,$to,$sample),'-'));
echo $sample; // prints paul-mitchell-foaming-pomade
Try this:
trim(preg_replace('/\s\s+/', ' ', preg_replace("/(?:\(.*?\)|\d+\s*(?:g|ml))/", "", $input)));
// "abc (def) 50g 500 ml 3m(ghi)" --> "abc 3m"

Categories