php - regex exact matches - php

I have the following strings:
Falchion-Case
P90-ASH-WOOD-WELL-WORN
I also have the following URLS which are inside a text file:
http://csgo.steamanalyst.com/id/115714004/FALCHION-CASE-KEY-
http://csgo.steamanalyst.com/id/115716486/FALCHION-CASE-
http://csgo.steamanalyst.com/id/2018/P90-ASH-WOOD-WELL-WORN
I'm looping through each line in the text file and checking if the string is found inside the URL:
// Read from file
if (stristr($item, "stattrak") === FALSE) {
$lines = file(public_path().'/csgoanalyst.txt');
foreach ($lines as $line) {
// Check if the line contains the string we're looking for, and print if it does
if(preg_match('/(?<!\-)('.$item.')\b/',$line) != false) { // case insensitive
echo $line;
break;
}
}
}
This works perfectly when $item = P90-ASH-WOOD-WELL-WORN however when $item = Falchion-Case It matches on both URL's when only the second: http://csgo.steamanalyst.com/id/115716486/FALCHION-CASE- is valid

Try modifying your regx to match the end of the line, assuming the line ends
'/('.$item.')$/'
This would match
http://csgo.steamanalyst.com/id/115714004/FALCHION-CASE-KEY- <<end of line
Basically do an ends with type match, you can do this too
'/('.$item.')\-?$/'
to optionally match an ending hyphen

You can also use a negative lookahead to negate that unwanted case:
preg_match('/(?<!-)'. preg_quote($item) .'(?!-\w)/i', $line);

Related

PHP stripos() return only false

I'm making a filter of Badword that return true if their is a bad word in a string.
but what happen is that whatever the user write the code return false.
I already tried to convert the arguments of stripos() to string (just in case) but still.
I tried preg_match() with "/$word/i", $_POST['message']
here is my function for the checking:
function MessageBad(){
$BadWord = false;
$bannedwords = file("bannedwords");
foreach($bannedwords as $word) {
if(stripos($_POST['message'], $word) !== false){
$BadWord = true;
}
}
return $BadWord;
}
but stripos($_POST['message'], $word) !== false always return false even when I enter only a badword from the bannedwods list...
By default, the strings returned by file() include the newline character at the end of each line. So $word ends with a newline, and will only match if the bad word is at the end of the line.
Use the FILE_IGNORE_NEW_LINES flag to remove the newlines.
$bannedwords = file("bannedwords", FILE_IGNORE_NEW_LINES);
You should also break out of the loop once you find a match, there's no need to keep checking other words.

Reading a file and extracting data using a regex in PHP

I am trying to echo out the names/paths of the files that are written in logfile.txt. For that, I use a regex to match everything before the first ocurrence of : and output it. I am reading the logfile.txt line by line:
<?php
$logfile = fopen("logfile.txt", "r");
if ($logfile) {
while (($line = fgets($logfile)) !== false) {
if (preg_match_all("/[^:]*/", $line, $matched)) {
foreach ($matched as $val) {
foreach ($val as $read) {
echo '<pre>'. $read . '</pre>';
}
}
}
}
fclose($logfile);
} else {
die("Unable to open file.");
}
?>
However, I get the entire contents of the file instead. The desired output would be:
/home/user/public_html/an-ordinary-shell.php
/home/user/public_html/content/execution-after-redirect.html
/home/user/public_html/paypal-gateway.html
Here is the content of logfile.txt:
-------------------------------------------------------------------------------
/home/user/public_html/an-ordinary-shell.php: Php.Trojan.PCT4-1 FOUND
/home/user/public_html/content/execution-after-redirect.html: {LDB}VT-malware33.UNOFFICIAL FOUND
/home/user/public_html/paypal-gateway.html: Html.Exploit.CVE.2015_6073
Extra question: How do I skip reading the first two lines (namely the dashes and emtpy line)?
Here you go:
<?php
# load it as a string
$data = #file("logfile.txt");
# data for this specific purpose
$data = <<< DATA
-------------------------------------------------------------------------------
/home/user/public_html/an-ordinary-shell.php: Php.Trojan.PCT4-1 FOUND
/home/user/public_html/content/execution-after-redirect.html: {LDB}VT-malware33.UNOFFICIAL FOUND
/home/user/public_html/paypal-gateway.html: Html.Exploit.CVE.2015_6073
DATA;
$regex = '~^(/[^:]+):~m';
# ^ - anchor it to the beginning
# / - a slash
# ([^:]+) capture at least anything NOT a colon
# turn on multiline mode with m
preg_match_all($regex, $data, $files);
print_r($files);
?>
It even skips both your lines, see a demo on ideone.com.
preg_match_all returns all occurrences for the pattern. For the first line, it will return:
/home/user/public_html/an-ordinary-shell.php,an empty string, Php.Trojan.PCT4-1 FOUND
and an other empty string
that don't contain :.
to obtain a single result, use preg_match, but to do that using explode should suffice.
To skip lines you don't want, you can for example build a generator function that gives only the good lines. You can also use a stream filter.

Test pipe delimited format with regular expression in php

I need to verify the string is in the following format, otherwise I need to return a warning message.
purple|grape juice
green|lettuce is good in salad
yellow|monkeys like bananas
red|water melon is delicious
I need it in the pipe delimited data like above, each line is split into two chunks then a new line.
If there is 3 pipes in in one line, then it is not correct.
This RegEx will validate the whole string to your specs: ^(\w+\|[\w ]+(\n|$))+$
If one of the lines is invalid, preg_match() will return false.
Note: I assumed left part will have numbers and letters only and the second the same plus space
See it live here: http://regex101.com/r/iV0tC4
So all the code you need is:
if (!preg_match($regex,$string)) trigger_error("Invalid string!",E_USER_WARNING);
Live code: http://codepad.viper-7.com/i3Hcjs
I'm not sure what you are trying to get, but if I'm guessed you need something like this:
<?
$lines = explode("\n",$str);
foreach ($lines as $lineIndex => $oneLine)
if (count(explode('|',$oneLine))>2) echo "You have an error in line ".$lineIndex;
?>
A RegExp match is all you need. Split the string by new lines, then run the RegExp match on each line, if the match returns false then end the function early (short circuit). The RegExp is essentially START_OF_LINE (^) and anything not a pipe ([^\|]) and a pipe (\|) and anything not a pipe ([^\|]) and END_OF_LINE ($).
function verify($str) {
$regex = '/^[^\|]*\|[^\|]*$/';
$all = explode("\n", $str);
foreach($all as $line) {
if(preg_match($regex, $line) == false)
return false;
}
return true;
}
echo verify("bad") === false;
echo verify("bad|bad|bad") === false;
echo verify("abc|123\nbad") === false;
echo verify("abc|123\nbad|bad|bad") === false;
echo verify("good|good") === true;
echo verify("good|good\nnice|nice") === true;

Is it possible to get image URLs from the start of a string in PHP?

I have an example string as follows
$string = '
http://image.gsfc.nasa.gov/image/image_launch_a5.jpg
http://pierre.chachatelier.fr/programmation/images/mozodojo-original-image.jpg
http://image.gsfc.nasa.gov/image/image_launch_a5.jpg
Alot of text
http://www.google.com/intl/en_ALL/images/logos/images_logo_lg.gif
more text';
I want to be able to extraxt the url's of the first three images (basicly whatever # of images are at the start of the string) but not extract any image URLs once my non image text starts. I can successfully use regex to grab all the image URL, but it also grabs the last google.com image which is inside the text.
Thanks for any ideas!!
Let R be regex to grab an image url
You need to grab (R)+ , i.e 0 or more occurrences of R
or mostly ((R)(w)?)+
Where w represents a regular expression to match white spaces.
How about avoiding regex and using explode instead?
$string = '....';
$urls = array();
$lines = explode(PHP_EOL,$string);
foreach ($lines as $line){
$line = trim($line);
// ignore empty lines
if (strlen($line) === 0) continue;
$pUrl = parse_url($line);
// non-valid URLs don't count
if ($pUrl === false) break;
// also skip URLs that aren't images
if (stripos($pUrl['path'],'.jpg') !== (strlen($pUrl['path']) - 4)) break;
// anything left is a valid URL and an image
// also, because a non-url fails and we skip empty lines, the first line
// that isn't an image will break the loop, thus stopping the capture
$urls[] = $line;
}
var_dump($urls);
Example at IDEOne

Manually move the fgetc file pointer to the next line

Question 1: How can I manually move the fgetc file pointer from its current location to the next line?
I'm reading in data character by character until a specified number of delimiters are counted. Once the delimiter count reaches a certain number, it needs to copy the remainder of the line until a new line (the record delimiter). Then I need to start copying character by character again starting at the next record.
Question 2: Is manually moving the file pointer to the next line the right idea? I would just explode(at "\n") but I have to count the pipe delimiters first because "\n" isn't always the record delimiter.
Here's my code (it puts all the data into the correct record until it reaches the last delimiter '|' in the record. It then puts the rest of the line into the next record because I haven't figured out how to make it correctly look for the '\n' after specified # of | are counted):
$file=fopen("source_data.txt","r") or exit ("File Open Error");
$record_incrementor = 0;
$pipe_counter = 0;
while (!feof($file))
{
$char_buffer = fgetc($file);
$str_buffer[] = $char_buffer;
if($char_buffer == '|')
{
$pipe_counter++;
}
if($pipe_counter == 46) //Maybe Change to 46
{
$database[$record_incrementor] = $str_buffer;
$record_incrementor++;
$str_buffer = NULL;
$pipe_counter = 0;
}
}
Sample Data:
1378|2009-12-13 11:51:45.783000000|"Pro" |"B13F28"||""|1||""|""|""|||False|||""|""|""|""||""||||||2010-12-15 11:51:51.330000000|108||||||""||||||False|""|""|False|""|||False
1379|2009-12-13 12:23:23.327000000|"TLUG"|"TUG"||""|1||""|""|""|||False|||""|""|""|""||""||||||1943-04-19 00:00:00|||||||""||||||False|""|""|False|""|||False
I'd say that doing this via file handling functions is a bit clumsy, when it could be done via regular expression quite easily. Just read the entire file into a string using file_get_contents() and doing a regular expression like /^(([^|]*\|){47}([^\r\n]*))/m with preg_match_all() could find you all the rows (which you can then explode() using | as the delimiter and setting 48 as the limit for number of fields.
Here is a working example function. The function takes the file name, field delimiter and the number of fields per row as the arguments. The function returns 2 dimensional array where first index is the data row number and the second is the field number.
function loadPipeData ($file, $delim = '|', $fieldCount = 48)
{
$contents = file_get_contents($file);
$d = preg_quote($delim, '/');
preg_match_all("/^(([^$d]*$d){" . ($fieldCount - 1) . '}([^\r\n]*))/m', $contents, $match);
$return = array();
foreach ($match[0] as $line)
{
$return[] = explode($delim, $line, $fieldCount);
}
return $return;
}
var_dump(loadPipeData('source_data.txt'));
(Note: this is a solution to the original problem)
You can read to the end of the line like this:
while (!feof($file) && fgetc($file) !== '\n');
As for whether or not fgetc is the right way to do this... your format makes it difficult to use anything else. You can't split on \n, because there may be newlines within a field, and you can't split on |, because the end of the record doesn't have a pipe.
The only other option I can think is to use preg_match_all:
$buffer = file_get_contents('test.txt');
preg_match_all('/((?:[^|]*\|){45}[^\n]*\n)/', $buffer, $matches);
foreach ($matches[0] as $row) {
$fields = explode('|', $row);
}
Answer to the modified question:
To read from the file pointer to the end of the line, you can simply use the file reading function fgets(). It returns everything from the current file pointer position until it reaches the end of the line (and also returns the end of the line character(s)). After the function call, the file reading pointer has been moved to the beginning of the next line.

Categories