Stripos start from end of string - php

I have some code that searches a huge log file and finds a keyword, for the majority of cases this is near the bottom of the document. It would be more efficient in this case to start my search at the bottom and work my way up.
$pos = stripos($body,$keyword);
$snippet_pre = substr($body, $pos, SNIPPET_LENGTH);
I've looked at strripos and although it does what i want as in find the last occurrence it sounds like it searches from the beginning of the document so ill be adding a lot of unnecessary work to my query as most of the keywords are near the bottom of the string/document
Any ideas?

Explode your log file by linebreaks to get an array. Reverse your array, and now you can search line by line from the end.
$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
// Search for your keyword
}
If you are talking about a massive log file, such that you absolutely do not want to read it all into memory, you could also look at a reverse seek approach, though that's typically not needed. See here:
Read a file backwards line by line using fseek

strripos will start from the end and search backwards if you set a negative offset as the third parameter.
$pos = stripos($body,$keyword,$offset);

Related

PHP Regex to remove everything after a character

So I've seen a couple articles that go a little too deep, so I'm not sure what to remove from the regex statements they make.
I've basically got this
foo:bar all the way to anotherfoo:bar;seg98y34g.?sdebvw h segvu (anything goes really)
I need a PHP regex to remove EVERYTHING after the colon. the first part can be any length (but it never contains a colon. so in both cases above I'd end up with
foo and anotherfoo
after doing something like this horrendous example of psuedo-code
$string = 'foo:bar';
$newstring = regex_to_remove_everything_after_":"($string);
EDIT
after posting this, would an explode() work reliably enough? Something like
$pieces = explode(':', 'foo:bar')
$newstring = $pieces[0];
explode would do what you're asking for, but you can make it one step by using current.
$beforeColon = current(explode(':', $string));
I would not use a regex here (that involves some work behind the scenes for a relatively simple action), nor would I use strpos with substr (as that would, effectively, be traversing the string twice). Most importantly, this provides the person who reads the code with an immediate, "Ah, yes, that is what the author is trying to do!" instead of, "Wait, what is happening again?"
The only exception to that is if you happen to know that the string is excessively long: I would not explode a 1 Gb file. Instead:
$beforeColon = substr($string, 0, strpos($string,':'));
I also feel substr isn't quite as easy to read: in current(explode you can see the delimiter immediately with no extra function calls and there is only one incident of the variable (which makes it less prone to human errors). Basically I read current(explode as "I am taking the first incident of anything prior to this string" as opposed to substr, which is "I am getting a substring starting at the 0 position and continuing until this string."
Your explode solution does the trick. If you really want to use regexes for some reason, you could simply do this:
$newstring = preg_replace("/(.*?):(.*)/", "$1", $string);
A bit more succinct than other examples:
current(explode(':', $string));
You can use RegEx that m.buettner wrote, but his example returns everything BEFORE ':', if you want everything after ':' just use $2 instead of $1:
$newstring = preg_replace("/(.*?):(.*)/", "$2", $string);
You could use something like the following. demo: http://codepad.org/bUXKN4el
<?php
$s = 'anotherfoo:bar;seg98y34g.?sdebvw h segvu';
$result = array_shift(explode(':', $s));
echo $result;
?>
Why do you want to use a regex?
list($beforeColon) = explode(':', $string);

Obtain first line of a string in PHP

In PHP 5.3 there is a nice function that seems to do what I want:
strstr(input,"\n",true)
Unfortunately, the server runs PHP 5.2.17 and the optional third parameter of strstr is not available. Is there a way to achieve this in previous versions in one line?
For the relatively short texts, where lines could be delimited by either one ("\n") or two ("\r\n") characters, the one-liner could be like
$line = preg_split('#\r?\n#', $input, 2)[0];
for any sequence before the first line feed, even if it an empty string,
or
$line = preg_split('#\r?\n#', ltrim($input), 2)[0];
for the first non-empty string.
However, for the large texts it could cause memory issues, so in this case strtok mentioned below or a substr-based solution featured in the other answers should be preferred.
When this answer was first written, almost a decade ago, it featured a few subtle nuances
it was too localized, following the Opening Post with the assumption that the line delimiter is always a single "\n" character, which is not always the case. Using PHP_EOL is not the solution as we can be dealing with outside data, not affected by the local system settings
it was assumed that we need the first non-empty string
there was no way to use either explode() or preg_split() in one line, hence a trick with strtok() was proposed. However, shortly after, thanks to the Uniform Variable Syntax, proposed by Nikita Popov, it become possible to use one of these functions in a neat one-liner
but as this question gained some popularity, it's better to cover all the possible edge cases in the answer. But for the historical reasons here is the original solution:
$str = strtok($input, "\n");
that will return the first non-empty line from the text in the unix format.
However, given that the line delimiters could be different and the behavior of strtok() is not that straight, as "Delimiter characters at the start or end of the string are ignored", as it says the man page for the original strtok() function in C, now I would advise to use this function with caution.
It's late but you could use explode.
<?php
$lines=explode("\n", $string);
echo $lines['0'];
?>
$first_line = substr($fulltext, 0, strpos($fulltext, "\n"));
or something thereabouts would do the trick. Ugly, but workable.
try
substr( input, 0, strpos( input, "\n" ) )
echo str_replace(strstr($input, '\n'),'',$input);
list($line_1, $remaining) = explode("\n", $input, 2);
Makes it easy to get the top line and the content left behind if you wanted to repeat the operation. Otherwise use substr as suggested.
not dependent from type of linebreak symbol.
(($pos=strpos($text,"\n"))!==false) || ($pos=strpos($text,"\r"));
$firstline = substr($text,0,(int)$pos);
$firstline now contain first line from text or empty string, if no break symbols found (or break symbol is a first symbol in text).
try this:
substr($text, 0, strpos($text, chr(10)))
You can use strpos combined with substr. First you find the position where the character is located and then you return that part of the string.
$pos = strpos(input, "\n");
if ($pos !== false) {
echo substr($input, 0, $pos);
} else {
echo 'String not found';
}
Is this what you want ?
l.e.
Didn't notice the one line restriction, so this is not applicable the way it is. You can combine the two functions in just one line as others suggested or you can create a custom function that will be called in one line of code, as wanted. Your choice.
Many times string manipulation will face vars that start with a blank line, so don't forget to evaluate if you really want consider white lines at first and end of string, or trim it. Also, to avoid OS mistakes, use PHP_EOL used to find the newline character in a cross-platform-compatible way (When do I use the PHP constant "PHP_EOL"?).
$lines = explode(PHP_EOL, trim($string));
echo $lines[0];
A quick way to get first n lines of a string, as a string, while keeping the line breaks.
Example 6 first lines of $multilinetxt
echo join("\n",array_splice(explode("\n", $multilinetxt),0,6));
Can be quickly adapted to catch a particular block of text, example from line 10 to 13:
echo join("\n",array_splice(explode("\n", $multilinetxt),9,12));

Pro regex converting these impossible-to-regex examples?

Example of input
vulture (wing)
tabulations: one leg; two legs; flying
father; master; patriarch
mat (box)
pedistal; blockade; pilar
animal belly (oval)
old style: naval
jackal's belly; jester slope of hill (arch)
key; visible; enlightened
Basically, I'm having trouble with some more complicated regex commands. Most of the code I'm finding that uses regex is very simple, but I could use it in so many places if I could get good with it. Would you look at the kind of stuff I'm trying to do and see if you can convert any of it?
Arrayize the word or words between the braces, "(" and ")".
Arrayize the first words following a new line ending xor four spaces and then a closing brace, ")", and a space and an open brace " (" AND the first words in the document up until a space and an open brace " (".
On any line with semicolons, arrayize the words which are separated by semicolons. Get the word or words after the last semicolon but do not get the words after a line break or four consecutive spaces. Words from lines that begin with the string "tabulations:" should not be included in this array, even though lines that begin with the string "tabulations:" have semicolons on them. If a new line ending in a close brace, ")" comes before a line containing semicolons and not starting with "tabulations" "no alternates" to the array, instead.
Get the word or words following the colon and preceding the line break on a line that begins with the string "old style:". If a new line ending in a close brace, ")" comes before a "tabulations:"-starting line, add "no old style" to the array, instead.
The same as 3, except only for lines that begin with the string "tabulations:". If a new line ending in a close brace, ")" comes before a "tabulations:"-starting line, add "no tabulations" to the array, instead.
I am trying to figure out how to do this via PHP, but I would be happy if anyone could field these requests in any language, especially php, C++, javascript, or batch. I also know that these are all very difficult to show, even for a puzzle lover. So, I promise 100 bonus points as soon as bounties are available for any complete answer.
-Edit-
First solution I was working on
Okay, so the first solution I was working on is to solve 3. I tried breaking the lines at the semicolons, and I was then hoping to grab the data, line-by-line and edit it further.
$input = file_get_contents('explode.txt');
foreach(explode("\n", $input) as $line){
$words = explode(';', $line);
foreach($words as $word){
echo $word;
}
}
Basically, looking at the output, the data ended up in the same format it was already in, only subtract the semicolons. This wasn't very useful, and I decided to stop.
Second solution I am working on
This is based around this line of code: preg_match_all('/\;([^;]+)\}/', $myFile, $matches).
There's now a working solution to part 1 of the question, thanks to EPB and fge:
$myFile = file_get_contents('fakexample.txt');
function get_between($startString, $endString, $myFile){
//Escape start and end strings.
$startStringSafe = preg_quote($startString, '/');
$endStringSafe = preg_quote($endString, '/');
//non-greedy match any character between start and end strings.
//s modifier should make it also match newlines.
preg_match_all("/$startStringSafe(.*?)$endStringSafe/s", $myFile, $matches);
return $matches;
}
$list = get_between("(", ")", $myFile);
foreach($list[1] as $list){
echo $list."\n";
}
Some issues I had were that I wasn't using RegEx correctly. I think the ArrayArray return problem was because I didn't encapsulate the preg_match_all function such that it returned $matches to a private function. I'm still unsure. I'm also still unsure about whether I should be using the file_get_contents() function to read the file.
The third solution attempt
So, I had an initial idea of how I wanted to approach this, and I decided to go about it my own way. Again, I started with question 1 because it seemed easiest. It has the fewest exceptions
function find_between($input,$start,$end) {
if (strpos($input,$start) === false || strpos($input,$end) === false) {
return false;
} else {
$start_position = strpos($input,$start)+strlen($start);
$end_position = strpos($input,$end);
return substr($input,$start_position,$end_position-$start_position);
}
}
$myFile = file_get_contents('explode.txt');
$output = find_between($myFile,'(',')');
echo $output;
As far as I can tell, this will work. The issue I'm having is with the recursion. I tried foreach($output as $output){echo $output;}, but this gave me an error. It seems obvious to me that it's because I haven't recursed and so haven't arrayized. The reason I stopped along this path is because I was told by several programmers that I was doomed to failure. So, I'm currently back to working on solution 2.
Is this for a homework assignment? These instructions(1-5) are not making any sense to me, as far as when you would have reason to do any of them outside an academic pursuit. It also seems like you're new to not only regexes but also PHP in general. As #Howard pointed out, we will not do your work for you.
Apart from that, if you need help w/regex, I'd be more than happy to assist; however it doesn't appear that that's what you need help with the most.
So here is what I can offer you, with regards to your question:
3) "On any line with semicolons, array-ize the words which are separated by semicolons.
Get the word or words after the last semicolon but do not get the words after a line break or four consecutive spaces. -> Easy: Explode by newline (\n)
Words from lines that begin with the string "tabulations:" should not be included in this array, even though lines that begin with the string "tabulations:" have semicolons on them. -> This is a bit trickier. First, regex for semicolon but NOT colon. This will most likely have to be handled by two separate regexes: first "tabulations:" and if that's NOT found, then search for semicolons. If this regex succeeds, then you can explode by semicolon and now you've got all the data to make all your arrays.
If a new line ending in a close brace, ")" comes before a line containing semicolons and not starting with "tabulations" "no alternates" to the array, instead." -> This one I'm leaving up to you to figure out, for more than a few reasons. ;-)

Is it possible to show line numbers when using a regex in PHP?

Is it possible to have a regex that is searching for a string like '\bfunction\b' that will display the line number where it found the match?
There's no simple way to do it, but if you wanted to, you could capture the match offset (using the PREG_OFFSET_CAPTURE flag for preg_match or preg_match_all) and then determine which line that location is in your string by counting how many newlines (for example) occur before that point.
For example:
$matches = array();
preg_match('/\bfunction\b/', $string, $matches, PREG_OFFSET_CAPTURE);
list($capture, $offset) = $matches[0];
$line_number = substr_count(substr($string, 0, $offset), "\n") + 1; // 1st line would have 0 \n's, etc.
Depending on what constitutes a "line" in your application, you might alternately want to search for \r\n or <br> (but that would be a bit more tricky because you'd have to use another regex to account for <br /> or <br style="...">, etc.).
I will suggest something that might work for you,
// Get a file into an array. In this example we'll go through HTTP to get
// the HTML source of a URL.
$lines = file('http://www.example.com/');
// Loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {
// do the regular expression or sub string search here
}
So far as I know it's not, but if you're on Linux or some other Unix-like system, grep will do that and can use (nearly) the same regular expression syntax as the preg_ family of functions with the -P flag.
No. You can pass the PREG_OFFSET_CAPTURE flag to preg_match, witch will tell you the offset in bytes. However, there is no easy way to convert this to a line number.
This is not regex, but works:
$offset = strpos($code, 'function');
$lines = explode("\n", substr($code, 0, $offset));
$the_line = count($lines);
Opps! This is not js!

Remove excessive line returns

I am looking for something like trim() but for within the bounds of a string. Users sometimes put 2, 3, 4, or more line returns after they type, I need to sanitize this input.
Sample input
i like cats
my cat is happy
i love my cat
hope you have a nice day
Desired output
i like cats
my cat is happy
i love my cat
hope you have a nice day
I am not seeing anything built in, and a string replace would take many iterations of it to do the work. Before I whip up a small recursive string replace, I wanted to see what other suggestions you all had.
I have an odd feeling there is a regex for this one as well.
function str_squeeze($body) {
return preg_replace("/\n\n+/", "\n\n", $body);
}
How much text do you need to do this on? If it is less than about 100k then you could probably just use a simple search and replace regex (searching something like /\n+/ and replace with \n)
On the other hand if you need to go through megabytes of data, then you could parse the text character by character, copying the input to the output, except when mulitple newlines are encountered, in which case you would just copy one newline and ignore the rest.
I would not recommend a recursive string replace though, sounds like that would be very very slow.
Finally managed to get it, needs preg so you are using the PCRE version in php, and also needs a \n\n replacement string, in order to not wipe all line endings but one:
$body = preg_replace("/\n\n+/", "\n\n", $body);
Thanks for getting me on the right track.
To consider all three line break sequences:
preg_replace('/(?:\r\n|[\r\n]){2,}/', "\n\n", $str)
The following regular expression should remove multiple linebreaks while ignoring single line breaks, which are okay by your definition:
ereg_replace("\n\n+", "\n\n", $string);
You can test it with this PHP Regular Expression test tool, which is very handy (but as it seems not in perfect parity with PHP).
[EDIT] Fixed the ' to ", as they didn't seem to work. Have to admit I just tested the regex in the web tool. ;)

Categories