PHP preg_replace replacing line break - php

When renumbering an array using
$arr=array_values($arr); // Renumber array
I realized that a line break is being introduced into one of the array strings, which I don't want.
My string is going from:
Property
Type
to Property
Type
In any case I am using:
$newelement = preg_replace("/[^A-Za-z0-9\s\s+]/", " ", $element);
already to remove unwanted characters prior to database insertion, so I tried to change it to:
$newelement = preg_replace("/[^A-Za-z0-9\s\s+'<br>''<br>''/n''/cr']/", " ", $element);
But there is no change, and the ?line feed/line break/carriage return is still there.
Am I doing the preg_replace call correctly?

That preg looks a bit complicated. And then you have ^ in the beginning as not A-Z... or linefeed. So you don't want to replace linefeed?
How about
$newelement = preg_replace("/[\n\r]/", "", $element);
or
$newelement = preg_replace("/[^A-Za-z ]/", "", $element);
\s also matches linefeed (\n).

This should work too:
// char(32) is whitespace
// For CR
$element = strtr($element, chr(13), chr(32));
// For LF
$element = strtr($element, chr(10), chr(32));

This thing worked for me.
preg_replace("/\r\n\r\n|\r\r|\n\n/", "<br />", $element);

It's a hack, but you can do something like this:
$email_body_string = preg_replace("/\=\r\n$/", "", $email_body_string);
The replacement says find a line that ends with an equals sign and has the standard carriage return and line feed characters afterwards. Replace those characters with nothing ("") and the equals sign will disappear. The line below it will be pulled up to join the first line.
Now, this implies that you will never have a line that ends with an equals sign, which is a risk. If you want to do it one better, check the line length where the wrap (with the equals sign) appears. It's usually about 73 characters in from the beginning of the line. Then you could say:
if (strlen(equals sign) == 73)
$email_body_string = preg_replace("/\=\r\n$/", "", $email_body_string);

Related

PHP str_replace scraped content with wild card?

I'm looking for a solution to strip some HTML from a scraped HTML page. The page has some repetitive data I would like to delete so I tried with preg_replace() to delete the variable data.
Data I want to strip:
Producent:<td class="datatable__body__item" data-title="Producent">Example
Groep:<td class="datatable__body__item" data-title="Produkt groep">Example1
Type:<td class="datatable__body__item" data-title="Produkt type">Example2
....
...
Must be like this afterwards:
Producent:Example
Groep:Example1
Type:Example2
So a big piece is the same except the word within the data-title piece. How could I delete this piece of data?
I tried a few things like this one:
$pattern = '/<td class=\"datatable__body__item\"(.*?)>/';
$tech_specs = str_replace($pattern,"", $tech_specs);
But that didn't work. Is there any solution to this?
Just use a wildcard:
$newstr = preg_replace('/<td class="datatable__body__item" data-title=".*?">/', '', $str);
.*? means match anything but don't be greedy
Assuming that the string looked like this:
$string = 'Producent:<td class="datatable__body__item" data-title="Producent">Example';
You could get the beginning and the end of the string with this:
preg_match('/^(\w+:).*\>(\w+)/', $string, $matches);
echo implode([$matches[1], $matches[2]]);
Which, in this case, will throw Producent:Example. So, then you could add this output to another variable/array you intend to use.
OR, since you mentioned replacing:
$string = preg_replace('/^(\w+:).*\>(\w+)/', '$1$2', $string);
But then again, checking as it would probably come in a variable number of lines:
$string = 'Producent:<td class="datatable__body__item" data-title="Producent">Example
Groep:<td class="datatable__body__item" data-title="Produkt groep">Example1
Type:<td class="datatable__body__item" data-title="Produkt type">Example2';
$stringRows = explode(PHP_EOL, $string);
$pattern = '/^(\w+:).*\>(\w+)/';
$replacement = '$1$2';
foreach ($stringRows as &$stringRow) {
$stringRow = preg_replace($pattern, $replacement, $stringRow);
}
$string = implode(PHP_EOL, $stringRows);
Which will then output the string like you expect.
Explaining my regex:
the first group catches the first word until the two dots :, then another group to catch the last word. I had previously specified anchors for both ends, but when breaking each line this wouldn't work as expected, so I kept only the beginning.
^(\w+:) => the word in the beginning of the string until two dots appear
.*\> => everything else until smaller symbol appears (escaped by slash)
(\w+) => the word after the smaller than symbol
Well maybe my question wasn't that good written. I had a table which I needed to scrape from a website. I needed the info in the table, but had to cleanup some parts as mentioned. The solution I finally made was this one and it works. It still has a little work to do with manual replacements but that is because of the stupid " they use for inch. ;-)
Solution:
\\ find the table in the sourcecode
foreach($techdata->find('table') as $table){
\\ filter out the rows
foreach($table->find('tr') as $row){
\\ take the innertext using simplehtmldom
$tech_specs = $row->innertext;
\\ strip some 'garbage'
$tech_specs = str_replace(" \t\t\t\t\t\t\t\t\t\t\t<td class=\"datatable__body__item\">","", $tech_specs);
\\ find the first word of the string so I can use it
$spec1 = explode('</td>', $tech_specs)[0];
\\ use the found string to strip down the rest of the table
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"" . $spec1 . "\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"tbv Montage benodigde 19\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"19\">",":", $tech_specs);
\\ strip some 'garbage'
$tech_specs = str_replace("\t\t\t\t\t\t\t\t\t\t","\n", $tech_specs);
$tech_specs = str_replace("</td>","", $tech_specs);
$tech_specs = str_replace(" ","", $tech_specs);
\\ put the clean row in an array ready for usage
$specs[] = $tech_specs;
}
}

Remove Next line at the beginning of a string in php

Im planning to remove all the Next line at the beginning of the string,
i tried using. str_replace("\n",null,$resultContent) it gives me the result that all Next line are removed.
Example. i need to remove the next line at the beginning of this string
"
String here
String following."
I need to delete the next line at the beginning
Please refer this page .
http://www.w3schools.com/php/func_string_trim.asp
use ltrim($resultContent,"\n") to remove all new line chars from starting of string.
Just explode and take the first result
Do not forget to do some test : if !is_array() .....
$x = explode("\n", $resultContent);
$resultContent = $x[0];
You can also use it like this:
if(startsWith($resultContent, '\n')) { // true: if string starts with '\n'
str_replace("\n",null,$resultContent);
}
Not sure whether you just want to strip blank lines, or remove everything after the \n from the first instance of a word... went with the former so hopefully this is what you're after:
$string = "String first line
string second line";
$replaced = preg_replace('/[\r\n]+/', PHP_EOL, $string);
echo $replaced;
Returns:
String first line
string second line
sounds like ltrim() is what you're looking for:
ltrim — Strip whitespace (or other characters) from the beginning of a
string!
echo $result_string = ltrim($string);

PHP replacing lines in a small paragraph from a big paragraph

I have 2 bulks of text: Trunk, and Card. Trunk has about 100 lines, and Card has 3. The 3 lines in Card exist in Trunk, but they aren't directly below eachother.
What I'm trying to do is remove each line of Card from the string Trunk.
What came to mind is exploding Card into an Array and using a for each in loop, like in AS3, but that didn't work like planned. Here's my attempt:
$zarray = explode("\n", $card); //Exploding the 3 lines which were seperated by linebreaks into an array
foreach ($zarray as $oneCard) //for each element of array
{
$oneCard.= "\n"; //add a linebreak at the end, so that when the text is removed from Trunk, there won't be an empty line in it.
print "$oneCard stuff"; //Strangely outputs all 3 elements of the array seperated by \r, instead of just 1, like this:
//card1\rcard2\rcard3 stuff
$zard = preg_replace("/$oneCard/i", "", $trunx, 1);//removes the line in Card from Trunk, case insensitive.
$trunx = $zard; //Trunk should now be one line shorter.
}
So, how can I use the foreach loop so that it replaces properly, and uses 1 element each time, instead of all of them in one go?
Consider
$trunk = "
a
b
c
d
e
f
";
$card = "
c
e
a
";
$newtrunk = implode("\n", array_diff(
explode("\n", $trunk),
explode("\n", $card)
));
print $newtrunk; // b d f
Or the other way round, your wording is a bit unclear.
Try this, it will be faster than the preg_replace due to the small amount being replaced:
//Find the new lines and add a delimiter
$card = str_replace("\n", "\n|#|", $card);
//Explode at the delimiter
$replaceParts = explode('|#|', $card);
//Perform the replacement with str_replace and use the array
$text = str_replace($replaceParts, '', $text);
This assumes there is always a newline after the search part and you do not care about case sensitivity.
If you do not know about the new line you will need a regex with an optional match for the newline.
If you need it case sensitive, look at str_ireplace
You could explode the $card, and keep the $trunk as string:
$needlearray = explode("\n", $card); //to generate the needle-array
$trunk = str_replace($needlearray,array(),$trunk); //replace the needle array by an empty array => replace by an empty string (according to the str_replace manual)
$trunk = str_replace("\n\n","\n",$trunk); //replace adjacent line breaks by only one line break

Erasing C comments with preg_replace

I need to erase all comments in $string which contains data from some C file.
The thing I need to replace looks like this:
something before that shouldnt be replaced
/*
* some text in between with / or * on many lines
*/
something after that shouldnt be replaced
and the result should look like this:
something before that shouldnt be replaced
something after that shouldnt be replaced
I have tried many regular expressions but neither work the way I need.
Here are some latest ones:
$string = preg_replace("/\/\*(.*?)\*\//u", "", $string);
and
$string = preg_replace("/\/\*[^\*\/]*\*\//u", "", $string);
Note: the text is in UTF-8, the string can contain multibyte characters.
You would also want to add the s modifier to tell the regex that .* should include newlines. I always think of s to mean "treat the input text as a single line"
So something like this should work:
$string = preg_replace("/\\/\\*(.*?)\\*\\//us", "", $string);
Example: http://codepad.viper-7.com/XVo9Tp
Edit: Added extra escape slashes to the regex as Brandin suggested because he is right.
I don't think regexp fit good here. What about wrote a very small parse to remove this? I don't do PHP coding for a long time. So, I will try to just give you the idea (simple alogorithm) I haven't tested this, it's just to you get the idea, as I said:
buf = new String() // hold the source code without comments
pos = 0
while(string[pos] != EOF) {
if(string[pos] == '/') {
pos++;
while(string[pos] != EOF)
{
if(string[pos] == '*' && string[pos + 1] == '/') {
pos++;
break;
}
pos++;
}
}
buf[buf_index++] = string[pos++];
}
where:
string is the C source code
buf a dynamic allocated string which expands as needed
It is very hard to do this perfectly without ending up writing a full C parser.
Consider the following, for example:
// Not using /*-style comment here.
// This line has an odd number of " characters.
while (1) {
printf("Wheee!
(*\/*)
\\// - I'm an ant!
");
/* This is a multiline comment with a // in, and
// an odd number of " characters. */
}
So, from the above, we can see that our problems include:
multiline quote sequences should be ignored within doublequotes. Unless those doublequotes are part of a comment.
single-line comment sequences can be contained in double-quoted strings, and in multiline strings.
Here's one possibility to address some of those issues, but far from perfect.
// Remove "-strings, //-comments and /*block-comments*/, then restore "-strings.
// Based on regex by mauke of Efnet's #regex.
$file = preg_replace('{("[^"]*")|//[^\n]*|(/\*.*?\*/)}s', '\1', $file);
try this:
$string = preg_replace("#\/\*\n?(.*)\*\/\n?#ms", "", $string);
Use # as regexp boundaries; change that u modifier with the right ones: m (PCRE_MULTILINE) and s (PCRE_DOTALL).
Reference: http://php.net/manual/en/reference.pcre.pattern.modifiers.php
It is important to note that my regexp does not find more than one "comment block"... Use of "dot match all" is generally not a good idea.

Regex to add line breaks before and after a string?

The following code removes comments, line breaks, and extra space from HTML and PHP files, but a problem I have is when the original file has <<<EOT; in it. What regex rule would I use to add a linebreak before and after <<<EOT; from $pre6?
//a bit messy, but this is the core of the program. removes whitespaces, line breaks, and comments. sometimes makes EOT error.
$pre1 = preg_replace('#<!--[^\[<>].*?(?<!!)-->#s', '', preg_replace('~>\s+<~', '><', trim(preg_replace('/\s\s+/', ' ', php_strip_whitespace(stripslashes(htmlspecialchars($uploadfile)))))));
$pre2 = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $pre1);
$pre3 = str_replace(array("\r\n", "\r"), "\n", $pre2);
$pre4 = explode("\r\n", $pre3);
$pre5 = array();
foreach ($pre4 as $i => $line) {
if(!empty($line))
$pre5[] = trim($line);
}
$pre6 = implode($pre5);
echo $pre6;
To match <<<EOT, you could use <{3}[A-Z]{3}, or several other patterns, depending on how strictly you want to match that exact text.
Oh, I see what you're after now. I'm not great with PHP, but in regular expressions, you can capture a named group and then refer to that group in a replacement operation. You could use the following to capture <<<EOT into a group named Capture:
(?<Capture><{3}[A-Z]{3})
I think in PHP you can refer to it using something like:
$regs['Capture']
So maybe you're after a replacement parameter value of something like:
"\r\n".$regs['Capture']."\r\n"
...if $regs was the parameter passed to the replace operation.

Categories