Chop() is chopping more than what is asked

Chop() is chopping more than what is asked - php

I have multiple random strings, and I'm trying to pull "SpottedBlanket" out of the string. Some of them work fine:
DarkBaySpottedBlanket --
DarkBay
BaySpottedBlanket --
Bay
but others are cutting out more than it should.
RedRoanSpottedBlanket --
RedR
BlackSpottedBlanket --
Blac
DunSpottedBlanket --
Du
this is the code I'm using, but I thought it would be self explanatory:
$AppyShortcut = chop($AppyColor,"SpottedBlanket");
$AppyColor would obviously be the random generated string. Any clue why this is happening?

The chop function takes the string in the second argument - which in this case is "SpottedBlanket", and removes any contiguous characters that it finds from the right hand side.
So for the case of "RedRoanSpottedBlanket", you'd get back "RedR" because "o", "a", and "n" are letters that can be found in the string "SpottedBlanket".
chop() is usually used to remove trailing white space - a way of cleaning user input before performing some action on it.
Give your array:
$strings = ["DarkBaySpottedBlanket", "RedRoanSpottedBlanket", "BlackSpottedBlanket", "DunSpottedBlanket"];
What you might be looking for is somerthing like this:
foreach ($strings as $string) {
print substr($string, 0, strrpos($string, "SpottedBlanket")) . "\n";
}
This finds the position of the string from the end using strrpos(), then returns the start of the string until that position, using substr().

Related

replace the first 4 character with php

I have a string /en/products/saucony-switchback-iso/416.html and I would like to replace the first 4th character /en/ with /de/.
The result should be /de/products/saucony-switchback-iso/416.html
This is what I've tried:
$href = "/en/products/saucony-switchback-iso/416.html";
$href_replace = substr_replace($href, "/de/", 0);
its only returning "/de/"?

You also need to define the length of how much you're replacing in the string, which in your case is 4 (or 3, seeing as the trailing / is present in both) characters.
$href = "/en/products/saucony-switchback-iso/416.html";
$href_replace = substr_replace($href, "/de/", 0, 4);
echo $href_replace;
If you don't define a length as in your example, it defaults to the entire length of the string http://php.net/manual/en/function.substr-replace.php
length
If given and is positive, it represents the length of the portion of
string which is to be replaced. If it is negative, it represents the
number of characters from the end of string at which to stop
replacing. If it is not given, then it will default to strlen( string
); i.e. end the replacing at the end of string
Which is why you're only being left with /de/

If you want to replace /en/, str_replace is a better function
echo str_replace("/en/", "/de/", $href);

Technically you only need to do 3 (but whatever), it's simple.
echo "/de" .substr("/en/products/saucony-switchback-iso/416.html", 3);
Output
/de/products/saucony-switchback-iso/416.html
Sandbox
I also think substr will be about as fast as you can get it.

PHP Using str_word_count with strsplit to form array after x words

I've got a large string that I want to put in an array after each 50 words. I thought about using strsplit to cut, but realised that wont take the words in to consideration, just split when it gets to x char.
I've read about str_word_count but can't work out how to put the two together.
What I've got at the moment is:
$outputArr = str_split($output, 250);
foreach($outputArr as $arOut){
echo $arOut;
echo "<br />";
}
But I want to substitute that to form each item of the array at 50 words instead of 250 characters.
Any help will be much appreciated.

Assuming that str_word_count is sufficient for your needs¹, you can simply call it with 1 as the second parameter and then use array_chunk to group the words in groups of 50:
$words = str_word_count($string, 1);
$chunks = array_chunk($words, 50);
You now have an array of arrays; to join every 50 words together and make it an array of strings you can use
foreach ($chunks as &$chunk) { // important: iterate by reference!
$chunk = implode(' ', $chunk);
}
¹ Most probably it is not. If you want to get what most humans consider acceptable results when processing written language you will have to use preg_split with some suitable regular expression instead.

There's another way:
<?php
$someBigString = <<<SAMPLE
This, actually, is a nice' old'er string, as they said, "divided and conquered".
SAMPLE;
// change this to whatever you need to:
$number_of_words = 7;
$arr = preg_split("#([a-z]+[a-z'-]*(?<!['-]))#i",
$someBigString, $number_of_words + 1, PREG_SPLIT_DELIM_CAPTURE);
$res = implode('', array_slice($arr, 0, $number_of_words * 2));
echo $res;
Demo.
I consider preg_split a better tool (than str_word_count) here. Not because the latter is inflexible (it is not: you can define what symbols can make up a word with its third param), but because preg_split will essentially stop processing the string after getting N items.
The trick, as quite common with this function, is to capture delimiters as well, then use them to reconstruct the string with the first N words (where N is given) AND punctuation marks saved.
(of course, the regex used in my example does not strictly comply to str_word_count locale-dependent behavior. But it still restricts the words to consist of alpha, ' and - symbols, with the latter two not at the beginning and the end of any word).

PHP - Substring after X characters with special-characters

Sorry for the title, I really didn't know how to say this...
I often have a string that needs to be cut after X characters, my problem is that this string often contains special characters like : & egrave ;
So, I'm wondering, is their a way to know in php, without transforming my string, if when I am cutting my string, I am in the middle of a special char.
Example
This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact
so right now my result with a sub string would be :
This is my string with a special char : &egra
but I want to have something like this :
This is my string with a special char : è

The best thing to do here is store your string as UTF-8 without any html entities, and use the mb_* family of functions with utf8 as the encoding.
But, if your string is ASCII or iso-8859-1/win1252, you can use the special HTML-ENTITIES encoding of the mb_string library:
$s = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
echo mb_substr($s, 0, 40, 'HTML-ENTITIES');
echo mb_substr($s, 0, 41, 'HTML-ENTITIES');
However, if your underlying string is UTF-8 or some other multibyte encoding, using HTML-ENTITIES is not safe! This is because HTML-ENTITIES really means "win1252 with high-bit characters as html entities". This is an example of where this can go wrong:
// Assuming that é is in utf8:
mb_substr('é ', 0, 2, 'HTML-ENTITIES') === 'Ã©'
// should be 'é '
When your string is in a multibyte encoding, you must instead convert all html entities to a common encoding before you split. E.g.:
$strings_actual_encoding = 'utf8';
$s_noentities = html_entity_decode($s, ENT_QUOTES, $strings_actual_encoding);
$s_trunc_noentities = mb_substr($s_noentities, 0, 41, $strings_actual_encoding);

The best solution would be to store your text as UTF-8, instead of storing them as HTML entities. Other than that, if you don't mind the count being off (&grave; equals one character, instead of 7), then the following snippet should work:
<?php
$string = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
$cut_string = htmlentities(mb_substr(html_entity_decode($string, NULL, 'UTF-8'), 0, 45), NULL, 'UTF-8')."<br><br>";
Note: If you use a different function to encode the text (e.g. htmlspecialchars()), then use that function instead of htmlentities(). If you use a custom function, then use another custom function that does the opposite of your new custom function instead of html_entity_decode() (and custom function instead of htmlentities()).

The longest HTML entity is 10 characters long, including the ampersand and semicolon. If you intend to cut the string at X bytes, check bytes X-9 through X-1 for an ampersand. If the corresponding semicolon appears at byte X or later, cut the string after the semicolon instead of after byte X.
However, if you're willing to preprocess the string, Mike's solution will be more accurate because his cuts the string at X characters, not bytes.

You can use html_entity_decode() first to decode all the HTML entities. Then split your string. Then htmlentities() to re-encode the entities.
$decoded_string = html_entity_decode($original_string);
// implement logic to split string here
// then for each string part do the following:
$encoded_string_part = htmlentities($split_string_part);

A little bruteforce solution, that I'm not really happy with would a PCRE expression, let's say that you want to pass 80 characters and the longest possible HTML expression is 7 chars long:
$regex = '~^(.{73}([^&]{7}|.{0,7}$|[^&]{0,6}&[^;]+;))(.*)~mx'
// Note, this could return a bit of shorter text
return preg_replace( $regexp, '$1', $text);
Just so you know:
.{73} - 73 characters
[^&]{7} - okay, we may fill it with anything that doesn't contain &
.{0,7}$ - keep in mind the possible end (this shouldn't be necessary because shorter text wouldn't match at all)
[^&]{0,6}&[^;]+; - up to 6 characters (you'd be at 79th), then & and let it finish
Something that seems much better but requires bit of play with numbers is to:
// check whether $text is at least $N chars long :)
if( strlen( $text) < $N){
return;
}
// Get last &
$pos = strrpos( $text, '&', $N);
// We're not young anymore, we have to check this too (not entries at all) :)
if( $pos === false){
return substr( $text, 0, $N);
}
// Get Last
$end = strpos( $text, ';', $N);
// false wouldn't be smaller then 0 (entry open at the beginning
if( $end === false){
$end = -1;
}
// Okay, entry closed (; is after &)(
if( $end > $pos){
return substr($text, 0, $N);
}
// Now we need to find first ;
$end = strpos( $text, ';', $N)
if( $end === false){
// Not valid HTML, not closed entry, do whatever you want
}
return substr($text, 0, $end);
Check numbers, there may be +/-1 somewhere in indexes...

I think you would have to use a combination of strpos and strrpos to find the next and previous spaces, parse the text between the spaces, check that against a known list of special characters, and if it matches, extend your "cut" to the position of the next space. If you had a code sample of what you have now, we could give you a better answer.

Regex to match specific string not enclosed by another, different specific string

I need a regex to match a string not enclosed by another different, specific string. For instance, in the following situation it would split the content into two groups: 1) The content before the second {Switch} and 2) The content after the second {Switch}. It wouldn't match the first {Switch} because it is enclosed by {my_string}'s. The string will always look like shown below (i.e. {my_string}any content here{/my_string})
Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too
{Switch}
More content
So far I've gotten what is below which I know isn't very close at all:
(.*?)\{Switch\}(.*?)
I'm just not sure how to use the [^] (not operator) with a specific string versus different characters.

It really seems you're trying to use a regular expression to parse a grammar - something that regular expressions are really bad at doing. You might be better off writing a parser to break down your string into the tokens that build it, and then processing that tree.
Perhaps something like http://drupal.org/project/grammar_parser might help.

Try this simple function:
function find_content()
function find_content($doc) {
$temp = $doc;
preg_match_all('~{my_string}.*?{/my_string}~is', $temp, $x);
$i = 0;
while (isset($x[0][$i])) {
$temp = str_replace($x[0][$i], "{REPL:$i}", $temp);
$i++;
}
$res = explode('{Switch}', $temp);
foreach ($res as &$part)
foreach($x[0] as $id=>$content)
$part = str_replace("{REPL:$id}", $content, $part);
return $res;
}
Use it this way
$content_parts = find_content($doc); // $doc is your input document
print_r($content_parts);
Output (your example)
Array
(
[0] => Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too
[1] =>
More content
)

You can try positive lookahead and lookbehind assertions (http://www.regular-expressions.info/lookaround.html)
It might look something like this:
$content = 'string of text before some random content switch text some more random content string of text after';
$before = preg_quote('String of text before');
$switch = preg_quote('switch text');
$after = preg_quote('string of text after');
if( preg_match('/(?<=' $before .')(.*)(?:' $switch .')?(.*)(?=' $after .')/', $content, $matches) ) {
// $matches[1] == ' some random content '
// $matches[2] == ' some more random content '
}

$regex = (?:(?!\{my_string\})(.*?))(\{Switch\})(?:(.*?)(?!\{my_string\}));
/* if "my_string" and "Switch" aren't wrapped by "{" and "}" just remove "\{" and "\}" */
$yourNewString = preg_replace($regex,"$1",$yourOriginalString);
This might work. Can't test it know, but i'll update later!
I don't if this is what you're looking for, but to negate more than one character, the regex syntax is:
(?!yourString)
and it is called "negative lookahead assertion".
/Edit:
This should work and return true:
$stringMatchesYourRulesBoolean = preg_match('~(.*?)('.$my_string.')(.*?)(?<!'.$my_string.') ?('.$switch.') ?(?!'.$my_string.')(.*?)('.$my_string.')(.*?)~',$yourString);

Have a look at PHP PEG. It is a little parser written in PHP. You can write your own grammar and parse it. It's going to be very simple in your case.
The grammar syntax and the way of parsing is all explained in the README.md
Extracts from the readme:
token* - Token is optionally repeated
token+ - Token is repeated at least one
token? - Token is optionally present
Tokens may be :
- bare-words, which are recursive matchers - references to token rules defined elsewhere in the grammar,
- literals, surrounded by `"` or `'` quote pairs. No escaping support is provided in literals.
- regexs, surrounded by `/` pairs.
- expressions - single words (match \w+)
Sample grammar: (file EqualRepeat.peg.inc)
class EqualRepeat extends Packrat {
/* Any number of a followed by the same number of b and the same number of c characters
* aabbcc - good
* aaabbbccc - good
* aabbc - bad
* aabbacc - bad
*/
/*Parser:Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
X: &(A !"b") "a"+ B !("a" | "b" | "c")
*/
}

How to write regex to return only certain parts of this string?

So I'm working on a project that will allow users to enter poker hand histories from sites like PokerStars and then display the hand to them.
It seems that regex would be a great tool for this, however I rank my regex knowledge at "slim to none".
So I'm using PHP and looping through this block of text line by line and on lines like this:
Seat 1: fabulous29 (835 in chips)
Seat 2: Nioreh_21 (6465 in chips)
Seat 3: Big Loads (3465 in chips)
Seat 4: Sauchie (2060 in chips)
I want to extract seat number, name, & chip count so the format is
Seat [number]: [letters&numbers&characters] ([number] in chips)
I have NO IDEA where to start or what commands I should even be using to optimize this.
Any advice is greatly appreciated - even if it is just a link to a tutorial on PHP regex or the name of the command(s) I should be using.

I'm not entirely sure what exactly to use for that without trying it, but a great tool I use all the time to validate my RegEx is RegExr which gives a great flash interface for trying out your regex, including real time matching and a library of predefined snippets to use. Definitely a great time saver :)

Something like this might do the trick:
/Seat (\d+): ([^\(]+) \((\d+)in chips\)/
And some basic explanation on how Regex works:
\d = digit.
\<character> = escapes character, if not part of any character class or subexpression. for example:
\t
would render a tab, while \\t would render "\t" (since the backslash is escaped).
+ = one or more of the preceding element.
* = zero or more of the preceding element.
[ ] = bracket expression. Matches any of the characters within the bracket. Also works with ranges (ex. A-Z).
[^ ] = Matches any character that is NOT within the bracket.
( ) = Marked subexpression. The data matched within this can be recalled later.
Anyway, I chose to use
([^\(]+)
since the example provides a name containing spaces (Seat 3 in the example). what this does is that it matches any character up to the point that it encounters an opening paranthesis.
This will leave you with a blank space at the end of the subexpression (using the data provided in the example). However, his can easily be stripped away using the trim() command in PHP.
If you do not want to match spaces, only alphanumerical characters, you could so something like this:
([A-Za-z0-9-_]+)
Which would match any letter (within A-Z, both upper- & lower-case), number as well as hyphens and underscores.
Or the same variant, with spaces:
([A-Za-z0-9-_\s]+)
Where "\s" is evaluated into a space.
Hope this helps :)

Look at the PCRE section in the PHP Manual. Also, http://www.regular-expressions.info/ is a great site for learning regex. Disclaimer: Regex is very addictive once you learn it.

I always use the preg_ set of function for REGEX in PHP because the PERL-compatible expressions have much more capability. That extra capability doesn't necessarily come into play here, but they are also supposed to be faster, so why not use them anyway, right?
For an expression, try this:
/Seat (\d+): ([^ ]+) \((\d+)/
You can use preg_match() on each line, storing the results in an array. You can then get at those results and manipulate them as you like.
EDIT:
Btw, you could also run preg_match_all on the entire block of text (instead of looping through line-by-line) and get the results that way, too.

Check out preg_match.
Probably looking for something like...
<?php
$str = 'Seat 1: fabulous29 (835 in chips)';
preg_match('/Seat (?<seatNo>\d+): (?<name>\w+) \((?<chipCnt>\d+) in chips\)/', $str, $matches);
print_r($matches);
?>
*It's been a while since I did php, so this could be a little or a lot off.*

May be it is very late answer, But I am interested in answering
Seat\s(\d):\s([\w\s]+)\s\((\d+).*\)
http://regex101.com/r/cU7yD7/1

Here's what I'm currently using:
preg_match("/(Seat \d+: [A-Za-z0-9 _-]+) \((\d+) in chips\)/",$line)

To process the whole input string at once, use preg_match_all()
preg_match_all('/Seat (\d+): \w+ \((\d+) in chips\)/', $preg_match_all, $matches);
For your input string, var_dump of $matches will look like this:
array
0 =>
array
0 => string 'Seat 1: fabulous29 (835 in chips)' (length=33)
1 => string 'Seat 2: Nioreh_21 (6465 in chips)' (length=33)
2 => string 'Seat 4: Sauchie (2060 in chips)' (length=31)
1 =>
array
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
2 =>
array
0 => string '835' (length=3)
1 => string '6465' (length=4)
2 => string '2060' (length=4)
On learning regex: Get Mastering Regular Expressions, 3rd Edition. Nothing else comes close to the this book if you really want to learn regex. Despite being the definitive guide to regex, the book is very beginner friendly.

Try this code. It works for me
Let say that you have below lines of strings
$string1 = "Seat 1: fabulous29 (835 in chips)";
$string2 = "Seat 2: Nioreh_21 (6465 in chips)";
$string3 = "Seat 3: Big Loads (3465 in chips)";
$string4 = "Seat 4: Sauchie (2060 in chips)";
Add to array
$lines = array($string1,$string2,$string3,$string4);
foreach($lines as $line )
{
$seatArray = explode(":", $line);
$seat = explode(" ",$seatArray[0]);
$seatNumber = $seat[1];
$usernameArray = explode("(",$seatArray[1]);
$username = trim($usernameArray[0]);
$chipArray = explode(" ",$usernameArray[1]);
$chipNumber = $chipArray[0];
echo "<br>"."Seat [".$seatNumber."]: [". $username."] ([".$chipNumber."] in chips)";
}

you'll have to split the file by linebreaks,
then loop thru each line and apply the following logic
$seat = 0;
$name = 1;
$chips = 2;
foreach( $string in $file ) {
if (preg_match("Seat ([1-0]): ([A-Za-z_0-9]*) \(([1-0]*) in chips\)", $string, $matches)) {
echo "Seat: " . $matches[$seat] . "<br>";
echo "Name: " . $matches[$name] . "<br>";
echo "Chips: " . $matches[$chips] . "<br>";
}
}
I haven't ran this code, so you may have to fix some errors...

Seat [number]: [letters&numbers&characters] ([number] in chips)
Your Regex should look something like this
Seat (\d+): ([a-zA-Z0-9]+) \((\d+) in chips\)
The brackets will let you capture the seat number, name and number of chips in groups.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Chop() is chopping more than what is asked - php

Related

replace the first 4 character with php

PHP Using str_word_count with strsplit to form array after x words

PHP - Substring after X characters with special-characters

Regex to match specific string not enclosed by another, different specific string

How to write regex to return only certain parts of this string?

Categories

Resources