preg_split - New Line - php

I'm currently developing a script that takes a message, splits it apart every 100 characters or so, and sends each part.
However, my original string has "\n" lines in it, and this is causing an issue with my preg_split, causing it to split pre-maturely (e.g., before 100 characters).
Here's what I am currently working with:
$sMessage = "Msg from John Smith \n".
"SUBJ: Hello! \n".
"This is the bulk of the message and can be well over 200 or 300 characters long. \n".
"To reply, type R 131, then ur msg.";
$split = preg_split('/(.{0,100})\s/',
$sMessage,
0,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
array_shift($split);
$count = count($split);
foreach ($split as $key => $message) {
$part = sprintf("(%d/%d) %s", $key+1, $count, $message);
echo $part;
echo "<br>";
}
Now, with this, I've noticed 3 things:
1) The first part of the message ("Msg from John Smith") does not even get included in the array.
2) The new lines (\n) seem to cut the string early.
3) Oddly enough, with the last line of the message ("To reply" etc...) it cuts off the last word ("msg.") and adds that into a new line, no matter what the sentence may read.
Any help on this would be great!

You are actually trying to reimplement the function wordwrap(). I think the call that does the job you need would look like this:
$array = explode("\n", wordwrap($string, 100, "\n", true));

my original string has "\n" lines in it
Use the PCRE_DOTALL modifier to allow ‘.’ to match newlines.
Chunking a string into an array of fixed-length strings can more easily be done using str_split. As soulmerge suggests, wordwrap is probably better if that's what you're really trying to do. You can use explode to split a string out to an array afterwards.
Avoid resorting to regex where string processing functions are available. (And PHP has a lot.)

Related

PHP replacing lines in a small paragraph from a big paragraph

I have 2 bulks of text: Trunk, and Card. Trunk has about 100 lines, and Card has 3. The 3 lines in Card exist in Trunk, but they aren't directly below eachother.
What I'm trying to do is remove each line of Card from the string Trunk.
What came to mind is exploding Card into an Array and using a for each in loop, like in AS3, but that didn't work like planned. Here's my attempt:
$zarray = explode("\n", $card); //Exploding the 3 lines which were seperated by linebreaks into an array
foreach ($zarray as $oneCard) //for each element of array
{
$oneCard.= "\n"; //add a linebreak at the end, so that when the text is removed from Trunk, there won't be an empty line in it.
print "$oneCard stuff"; //Strangely outputs all 3 elements of the array seperated by \r, instead of just 1, like this:
//card1\rcard2\rcard3 stuff
$zard = preg_replace("/$oneCard/i", "", $trunx, 1);//removes the line in Card from Trunk, case insensitive.
$trunx = $zard; //Trunk should now be one line shorter.
}
So, how can I use the foreach loop so that it replaces properly, and uses 1 element each time, instead of all of them in one go?
Consider
$trunk = "
a
b
c
d
e
f
";
$card = "
c
e
a
";
$newtrunk = implode("\n", array_diff(
explode("\n", $trunk),
explode("\n", $card)
));
print $newtrunk; // b d f
Or the other way round, your wording is a bit unclear.
Try this, it will be faster than the preg_replace due to the small amount being replaced:
//Find the new lines and add a delimiter
$card = str_replace("\n", "\n|#|", $card);
//Explode at the delimiter
$replaceParts = explode('|#|', $card);
//Perform the replacement with str_replace and use the array
$text = str_replace($replaceParts, '', $text);
This assumes there is always a newline after the search part and you do not care about case sensitivity.
If you do not know about the new line you will need a regex with an optional match for the newline.
If you need it case sensitive, look at str_ireplace
You could explode the $card, and keep the $trunk as string:
$needlearray = explode("\n", $card); //to generate the needle-array
$trunk = str_replace($needlearray,array(),$trunk); //replace the needle array by an empty array => replace by an empty string (according to the str_replace manual)
$trunk = str_replace("\n\n","\n",$trunk); //replace adjacent line breaks by only one line break

PHP and regex to cut from begining to new line

I don't have any expirience in regex with php (just some copy/pasting for simple tasks) so I can't 'solve' simple situatio, and I hope that somone can help.
I have a lage files that look like:
xxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxx
text,text,text,
text,text,tec,etc
Now i need in php to write function where passed string with content like this will return me part that I marked here as 'xxx...'
After every 'xxxx...' sequence at file begin, I have as you can see new line (\r\n) and I don't know how to use regex to to cut upper part into another string, so please help.
I'd skip the regex.
$eol = strpos($s, "\r\n");
$header = substr($s, 0, $eol);
$rest = substr($s, $eol + 2);
If the only part of the string separated by two new lines is the header series of 'xxxx' and the following 'text,text,text' lines, you could simply use explode. Passing a limit of 2 means it will only explode at the first instance of "\n\n", returning an array of at most 2 items:
list($header, $body) = explode("\n\n", $file_contents, 2);

Does anyone have a PHP snippet of code for grabbing the first "sentence" in a string?

If I have a description like:
"We prefer questions that can be answered, not just discussed. Provide details. Write clearly and simply."
And all I want is:
"We prefer questions that can be answered, not just discussed."
I figure I would search for a regular expression, like "[.!\?]", determine the strpos and then do a substr from the main string, but I imagine it's a common thing to do, so hoping someone has a snippet lying around.
A slightly more costly expression, however will be more adaptable if you wish to select multiple types of punctuation as sentence terminators.
$sentence = preg_replace('/([^?!.]*.).*/', '\\1', $string);
Find termination characters followed by a space
$sentence = preg_replace('/(.*?[?!.](?=\s|$)).*/', '\\1', $string);
<?php
$text = "We prefer questions that can be answered, not just discussed. Provide details. Write clearly and simply.";
$array = explode('.',$text);
$text = $array[0];
?>
My previous regex seemed to work in the tester but not in actual PHP. I have edited this answer to provide full, working PHP code, and an improved regex.
$string = 'A simple test!';
var_dump(get_first_sentence($string));
$string = 'A simple test without a character to end the sentence';
var_dump(get_first_sentence($string));
$string = '... But what about me?';
var_dump(get_first_sentence($string));
$string = 'We at StackOverflow.com prefer prices below US$ 7.50. Really, we do.';
var_dump(get_first_sentence($string));
$string = 'This will probably break after this pause .... or won\'t it?';
var_dump(get_first_sentence($string));
function get_first_sentence($string) {
$array = preg_split('/(^.*\w+.*[\.\?!][\s])/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
// You might want to count() but I chose not to, just add
return trim($array[0] . $array[1]);
}
Try this:
$content = "My name is Younas. I live on the pakistan. My email is **fromyounas#gmail.com** and skype name is "**fromyounas**". I loved to work in **IOS development** and website development . ";
$dot = ".";
//find first dot position
$position = stripos ($content, $dot);
//if there's a dot in our soruce text do
if($position) {
//prepare offset
$offset = $position + 1;
//find second dot using offset
$position2 = stripos ($content, $dot, $offset);
$result = substr($content, 0, $position2);
//add a dot
echo $result . '.';
}
Output is:
My name is Younas. I live on the pakistan.
current(explode(".",$input));
I'd probably use any of the multitudes of substring/string-split functions in PHP (some mentioned here already).
But also look for ". " OR ".\n" (and possibly ".\n\r") instead of just ".". Just in case for whatever reason, the sentence contains a period that isn't followed by a space. I think it will harden the likelihood of you getting genuine results.
Example, searching for just "." on:
"I like stackoverflow.com."
Will get you:
"I like stackoverflow."
When really, I'm sure you'd prefer:
"I like stackoverflow.com."
And once you have that basic search, you'll probably come across one or two occasions where it may miss something. Tune as you run with it!
Try this:
reset(explode('.', $s, 2));

Delete first four lines from the top in content stored in a variable

I have a variable that needs the first four lines stripped out before being displayed:
Error Report Submission
From: First Last, email#example.com, 12345
Date: 2009-04-16 04:33:31 pm Eastern
The content to be output starts here and can go on for any number of lines.
I need to remove the 'header' from this data before I display it as part of a 'pending error reports' view.
Mmm. I am sure someone is going to come up with something nifty/shorter/nicer, but how about:
$str = implode("\n", array_slice(explode("\n", $str), 4));
If that is too unsightly, you can always abstract it away:
function str_chop_lines($str, $lines = 4) {
return implode("\n", array_slice(explode("\n", $str), $lines));
}
$str = str_chop_lines($str);
EDIT: Thinking about it some more, I wouldn't recommend using the str_chop_lines function unless you plan on doing this in many parts of your application. The original one-liner is clear enough, I think, and anyone stumbling upon str_chop_lines may not realize the default is 4 without going to the function definition.
$content = preg_replace("/^(.*\n){4}/", "", $content);
Strpos helps out a lot: Here's an example:
// $myString = "blah blah \n \n \n etc \n \n blah blah";
$len = strpos($myString, "\n\n");
$string = substr($myString, $len, strlen($myString) - $len);
$string then contains the string after finding those two newlines in a row.
Split the string into an array using split(rex), where rex matches two consecutive newlines, and then concatenate the entire array, except for the first element (which is the header).

How to write regex to return only certain parts of this string?

So I'm working on a project that will allow users to enter poker hand histories from sites like PokerStars and then display the hand to them.
It seems that regex would be a great tool for this, however I rank my regex knowledge at "slim to none".
So I'm using PHP and looping through this block of text line by line and on lines like this:
Seat 1: fabulous29 (835 in chips)
Seat 2: Nioreh_21 (6465 in chips)
Seat 3: Big Loads (3465 in chips)
Seat 4: Sauchie (2060 in chips)
I want to extract seat number, name, & chip count so the format is
Seat [number]: [letters&numbers&characters] ([number] in chips)
I have NO IDEA where to start or what commands I should even be using to optimize this.
Any advice is greatly appreciated - even if it is just a link to a tutorial on PHP regex or the name of the command(s) I should be using.
I'm not entirely sure what exactly to use for that without trying it, but a great tool I use all the time to validate my RegEx is RegExr which gives a great flash interface for trying out your regex, including real time matching and a library of predefined snippets to use. Definitely a great time saver :)
Something like this might do the trick:
/Seat (\d+): ([^\(]+) \((\d+)in chips\)/
And some basic explanation on how Regex works:
\d = digit.
\<character> = escapes character, if not part of any character class or subexpression. for example:
\t
would render a tab, while \\t would render "\t" (since the backslash is escaped).
+ = one or more of the preceding element.
* = zero or more of the preceding element.
[ ] = bracket expression. Matches any of the characters within the bracket. Also works with ranges (ex. A-Z).
[^ ] = Matches any character that is NOT within the bracket.
( ) = Marked subexpression. The data matched within this can be recalled later.
Anyway, I chose to use
([^\(]+)
since the example provides a name containing spaces (Seat 3 in the example). what this does is that it matches any character up to the point that it encounters an opening paranthesis.
This will leave you with a blank space at the end of the subexpression (using the data provided in the example). However, his can easily be stripped away using the trim() command in PHP.
If you do not want to match spaces, only alphanumerical characters, you could so something like this:
([A-Za-z0-9-_]+)
Which would match any letter (within A-Z, both upper- & lower-case), number as well as hyphens and underscores.
Or the same variant, with spaces:
([A-Za-z0-9-_\s]+)
Where "\s" is evaluated into a space.
Hope this helps :)
Look at the PCRE section in the PHP Manual. Also, http://www.regular-expressions.info/ is a great site for learning regex. Disclaimer: Regex is very addictive once you learn it.
I always use the preg_ set of function for REGEX in PHP because the PERL-compatible expressions have much more capability. That extra capability doesn't necessarily come into play here, but they are also supposed to be faster, so why not use them anyway, right?
For an expression, try this:
/Seat (\d+): ([^ ]+) \((\d+)/
You can use preg_match() on each line, storing the results in an array. You can then get at those results and manipulate them as you like.
EDIT:
Btw, you could also run preg_match_all on the entire block of text (instead of looping through line-by-line) and get the results that way, too.
Check out preg_match.
Probably looking for something like...
<?php
$str = 'Seat 1: fabulous29 (835 in chips)';
preg_match('/Seat (?<seatNo>\d+): (?<name>\w+) \((?<chipCnt>\d+) in chips\)/', $str, $matches);
print_r($matches);
?>
*It's been a while since I did php, so this could be a little or a lot off.*
May be it is very late answer, But I am interested in answering
Seat\s(\d):\s([\w\s]+)\s\((\d+).*\)
http://regex101.com/r/cU7yD7/1
Here's what I'm currently using:
preg_match("/(Seat \d+: [A-Za-z0-9 _-]+) \((\d+) in chips\)/",$line)
To process the whole input string at once, use preg_match_all()
preg_match_all('/Seat (\d+): \w+ \((\d+) in chips\)/', $preg_match_all, $matches);
For your input string, var_dump of $matches will look like this:
array
0 =>
array
0 => string 'Seat 1: fabulous29 (835 in chips)' (length=33)
1 => string 'Seat 2: Nioreh_21 (6465 in chips)' (length=33)
2 => string 'Seat 4: Sauchie (2060 in chips)' (length=31)
1 =>
array
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
2 =>
array
0 => string '835' (length=3)
1 => string '6465' (length=4)
2 => string '2060' (length=4)
On learning regex: Get Mastering Regular Expressions, 3rd Edition. Nothing else comes close to the this book if you really want to learn regex. Despite being the definitive guide to regex, the book is very beginner friendly.
Try this code. It works for me
Let say that you have below lines of strings
$string1 = "Seat 1: fabulous29 (835 in chips)";
$string2 = "Seat 2: Nioreh_21 (6465 in chips)";
$string3 = "Seat 3: Big Loads (3465 in chips)";
$string4 = "Seat 4: Sauchie (2060 in chips)";
Add to array
$lines = array($string1,$string2,$string3,$string4);
foreach($lines as $line )
{
$seatArray = explode(":", $line);
$seat = explode(" ",$seatArray[0]);
$seatNumber = $seat[1];
$usernameArray = explode("(",$seatArray[1]);
$username = trim($usernameArray[0]);
$chipArray = explode(" ",$usernameArray[1]);
$chipNumber = $chipArray[0];
echo "<br>"."Seat [".$seatNumber."]: [". $username."] ([".$chipNumber."] in chips)";
}
you'll have to split the file by linebreaks,
then loop thru each line and apply the following logic
$seat = 0;
$name = 1;
$chips = 2;
foreach( $string in $file ) {
if (preg_match("Seat ([1-0]): ([A-Za-z_0-9]*) \(([1-0]*) in chips\)", $string, $matches)) {
echo "Seat: " . $matches[$seat] . "<br>";
echo "Name: " . $matches[$name] . "<br>";
echo "Chips: " . $matches[$chips] . "<br>";
}
}
I haven't ran this code, so you may have to fix some errors...
Seat [number]: [letters&numbers&characters] ([number] in chips)
Your Regex should look something like this
Seat (\d+): ([a-zA-Z0-9]+) \((\d+) in chips\)
The brackets will let you capture the seat number, name and number of chips in groups.

Categories