PHP: Searching for an exact string pattern in array elements - php

I want to validate all the files that are being uploaded on my site through PHP. I am using regular expressions to compare the file contents but it doesn't seem to be working as I expect it to work. I want to accept files with 1 term per line only.
EXPECTED INPUT:
HP34930
HP09099
HP98899
UNACCEPTABLE INPUT:
HP89980 HP98798 HP09232
some other text
HP58089
Here is my code:
$texthandle = file($_FILES["textfile"]["tmp_name"]);
foreach ($texthandle as $textline)
{
if (!preg_match("/(HP\d+){1}/", $textline))
{
echo "Incorrect file format. Please provide a text file with 1 term per line.";
exit(0);
}
}
Could someone suggest why this isn't detecting the way I want it to?
I have also tried
if (!preg_match("/^(HP\d+){1}$/", $textline))
but it isn't working as I expect it to work.
Thanks for your help!

Not sure what not working is, but try:
$texthandle = file($_FILES["textfile"]["tmp_name"], FILE_IGNORE_NEW_LINES);
and then:
if (!preg_match("/^HP\d+$/", $textline))
or if there are only 5 digits allowed:
if (!preg_match("/^HP[\d]{5}$/", $textline))
If there is any whitespace at the end like spaces, tabs, etc. it will fail, so you can try trim() on $textline.

Related

PHP variables look the same but are not equal (I'm confused)

OK, so I shave my head, but if I had hair I wouldn't need a razor because I'd have torn it all out tonight. It's gone 3am and what looked like a simple solution at 00:30 has become far from it.
Please see the code extract below..
$psusername = substr($list[$count],16);
if ($psusername == $psu_value){
$answer = "YES";
}
else {
$answer = "NO";
}
$psusername holds the value "normann" which is taken from a URL in a text based file (url.db)
$psu_value also holds the value "normann" which is retrieved from a cookie set on the user's computer (or a parameter in the browser address bar - URL).
However, and I'm sure you can guess my problem, the variable $answer contains "NO" from the test above.
All the PHP I know I've picked up from Google searches and you guys here, so I'm no expert, which is perhaps evident.
Maybe this is a schoolboy error, but I cannot figure out what I'm doing wrong. My assumption is that the data types differ. Ultimately, I want to compare the two variables and have a TRUE result when they contain the same information (i.e normann = normann).
So if you very clever fellows can point out why two variables echo what appears to be the same information but are in fact different, it'd be a very useful lesson for me and make my users very happy.
Do they echo the same thing when you do:
echo gettype($psusername) . '\n' . gettype($psu_value);
Since i can't see what data is stored in the array $list (and the index $count), I cannot suggest a full solution to yuor problem.
But i can suggest you to insert this code right before the if statement:
var_dump($psusername);
var_dump($psu_value);
and see why the two variables are not identical.
The var_dump function will output the content stored in the variable and the type (string, integer, array ec..), so you will figure out why the if statement is returning false
Since it looks like you have non-printable characters in your string, you can strip them out before the comparison. This will remove whatever is not printable in your character set:
$psusername = preg_replace("/[[:^print:]]/", "", $psusername);
0D 0A is a new line. The first is the carriage return (CR) character and the second is the new line (NL) character. They are also known as \r and \n.
You can just trim it off using trim().
$psusername = trim($psusername);
Or if it only occurs at the end of the string then rtrim() would do the job:
$psusername = rtrim($psusername);
If you are getting the values from the file using file() then you can pass FILE_IGNORE_NEW_LINES as the second argument, and that will remove the new line:
$contents = file('url.db', FILE_IGNORE_NEW_LINES);
I just want to thank all who responded. I realised after viewing my logfile the outputs in HEX format that it was the carriage return values causing the variables to mismatch and a I mentioned was able to resolve (trim) with the following code..
$psusername = preg_replace("/[^[:alnum:]]/u", '', $psusername);
I also know that the system within which the profiles and usernames are created allow both upper and lower case values to match, so I took the precaution of building that functionality into my code as an added measure of completeness.
And I'm happy to say, the code functions perfectly now.
Once again, thanks for your responses and suggestions.

Using non standard characters in associative array

Good day all! I am working on a parser for a chat room that can color text based on who was talking for archive purposes. I have it working perfectly, except now the administrator wants to be able to remove the "fancy" names and replace with more readable versions for some of their regular people.
The chat room allows an extended range of letters and symbols to use, that, when transferred to a rtf file, may not exactly transfer fully.
I cant get it to work, and dont see any reason why it should not.
This is an example of what I have:
$nameconvert = array(
"îrúål__Þħōþħ" => "Eriel__Thoth",
);
***Scripting that parses an uploaded text
file line by line, each line places in an
array using space as delimiter... thus
name of person talking is $row_data[0]***
$name = $row_data[0];
$name = $nameconvert[$name];
** Code to throw everything back together **
Now, this is just a simplified snippet, but for whatever reason, it does not work. Now if I did $name = $nameconvert['îrúål__Þħōþħ'] then it does work, telling me that the name im putting in script, and name being pulled from mytext file are two different things, though they are visually identical
HELP!
I have found the answer, and wish to share my solution to others.
This is the modified code
$nameconvert = array(
"0123456789abcdef" => "Eriel__Thoth",
);
***Scripting that parses an uploaded text
file line by line, each line places in an
array using space as delimiter... thus
name of person talking is $row_data[0]***
$name = $row_data[0]
$name = $nameconvert[bin2hex(mb_convert_encoding($name,"UTF-8"))];
$name = $nameconvert[$name];
** Code to throw everything back together **
The command bin2hex(mb_convert_encoding($name,"UTF-8")) takes the name from the file, ensures it is in UTF-8 format, then creates its hexadecimal equivalent. It then uses that in the array to correspond to a easier to read name
It works just the way I am wanting!

Handling text file with unknown newline positions

My problem is simple: I have a text file, which i handle and insert all the data in a database and also do stuff with it for each new line. The problem is that the text file is a log for sms'es received in my gateway and depending on the text that is being sent I would have a line corresponding to each sms. If an SMS does not have any new lines in its body, everything is alright, on the other hand, if and SMS is sent like this:
"Test
TestOnANewLine"
I get a log file that breaks and with a new line everytime. A sample follows:
2012-01-01 10:10:10,4C64DCD6.req,192.168.999.999,+12223334444,OK -- SMPP - 999.999.999.999:9999,SubmitUser=user;Sender=sender;SMSCMsgId=999999999;Text="Test1
NewLineTest
AnotherNEwLineTEst"
The log file is interpreted like this:
date time, smsid, ip that processed it, number that is being sent to, status --connection type - ip that is sent from, user that submitted; sender name that is displayed; sms connection id; body of the sms
As for the language I am using PHP and for the functions used its a simple
foreach($lines as $line)
{ explode and do stuff }
How do I handle this situation? At this point any help is appreciated
Thanks in advance!!
fgetcsv could handle the linebreaks enclosed in '"' but with an additional '"' character in the body it would fail...
So what about some unresponsible regexp usage?
preg_match_all(#^(\d{4}-\d{2}-\d{2}[^,]+),([^,]+),([^,]+),([^,]+),([^,]+),SubmitUser=([^;])+;Sender=([^;])+;SMSCMsgId=([^;])+;Text="([\w\d\s\.\-,:;'"]+)"$#im', $file, $matches);
should do the job, for not too crazy texts, maybe you should adpot the \w\d\s.-,:;'" expression more to your needs
Couldn't you loop through the newlines until you can parse a date from it?
Maybe take into account that the previous line ended with a double quote ?
I know its not fool proof but without some recognisable "end of message" character(s). This is the best i could think of :P
First of all, thank you for all the feedback, it was really precious and it helped me on solving this issue. Also, for all the other people that will look through this post and would want a solution here is mine:
I changed the way I would interpret the end of line /r/n from the regular one to /r/n2 which means that ill consider a new line in my file reading if and only if there is a regular new line /r/n and on the new physical line there is a 2 (which is the beginning of the year)
The actual solved part is:
$data = file_get_contents($backup_file);
$lines=explode("\r\n2",$data);
foreach($lines as $line)
{
//explode and do stuff
}
Try this to get all the log entries normalized into a single array item per log entry (i.e. combine entries across multiple line breaks into a single item)
$line_array = file('/path/to/file');
$log_array = array();
$i = -1;
$date_pattern = '/^[0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}/';
foreach ($line_array as $line) {
if (1 === preg_match($date_pattern, $line)) {
// this is a new log entry
// let's trim the whitespace from the end of the last log array entry since we are done with it
if(isset($log_array[$i])) {
$log_array[$i] = rtrim($log_array[$i]);
}
// start a new log array entry
$i++;
$log_array[$i] = $line;
} else {
// this is not a new log entry
$log_array[$i] .= $line;
}
}
After that you should be able to work with $log_array to extract the data you need. By the way I should note that when you loop through the $log_array. It would probably be helpful to extract the msg text first. If you do a greedy preg_match on the double quotes, you shouldn't have any problems with messages that have quotes within them as the greedy match will find the largest possible matching string, which in your case would be everything between the quotes bounding the message content.

Why is fopen() behaving like this?

I am currently working on this project which requires me to make a function which dinamically decides the directory name and then creates a simple .txt file in that directory.
my code is as follows:
($destinatario is a string)
$diretorio="../messages/".$destinatario;
if (is_dir($diretorio)) {
;
}else{
mkdir($diretorio);
}
$path=$diretorio."/".$data.",".$assunto.",".$remetente.".txt";
$handle=fopen($path,'w+');
fwrite($handle, $corpo);
fclose($handle);
nevermind the portuguese, but the bottom line is that it should create a .txt file using the naming guidelines i've provided. The funny thing is that when i do this, php creates this weird file whose filename is "01.09.2010 04"
(with no extension at all) which amounts to the first few characters of the actual filename i'd like to create...
edit($data is actually the output from a call to date("d.m.Y H:i"))
Per comment by OP:
[$data is] actually the output of a call to date("d.m.Y H:i")
The problem is the : character. (Still, there may be other illegal characters in the other parts composing the final file name.)
EDIT
The essence of the problem and solution is in the comments to #tchen's answer. Keep in mind that colon is a valid file name character on (some? all?) *nix platforms but is invalid on Windows.
Make sure there's no bad characters at the end of $data. Call trim() on it.
If it's data taken from a file, it may have a '\r' or '\n' at the end of it.
Not related, but make sure your if statements don't have unused conditions:
if (!is_dir($diretorio)) {
mkdir($diretorio);
}
This will also get rid of that blank line with a single terminator ;, I'm sure that isn't right.
Some ideas:
have you tried not using commas in the filename?
Have you checked the return value if fopen and fwrite?
Just to try to isolate the problem
also you can simplify to:
if (!is_dir($diretorio)) {
mkdir($diretorio);
}

Preg_match help : selecting files in a folder

I have the following code that selects all the different template files out of a folder... The file names I have are:
template_default.php
template_default_load.php
template_sub.php
template_sub_load.php
I only want to select the ones without the _load in the file name so I used this code:
preg_match('/^template_(.*)[^[_load]]{0}\.php$/i', $layout_file, $layout_name)
The code works fine except it cuts the last character off the result... Instead of returning default or sub when I echo $layout_name[1], it shows defaul and su...
Any ideas what is wrong with my code?
This part is totally up the creek:
[^[_load]]{0}
This is the regex you want:
/^template_(.*)(?<!_load)\.php$/i
You'll have to use negative assertions. (read below why not)
preg_match('/^template_(.*?)(?!_load)\.php$/i', $layout_file, $layout_name)
Edit: come to think of it, that regexp won't actually work because "_load" will be consumed by the .*? part. My advice: don't use preg_match() to capture the name, use it to skip those that end with _load, e.g.
foreach (glob('template_*') as $filepath)
{
if (preg_match('#_load\\.php$', $filepath))
{
// The file ends with _load
}
$filename = basename($filepath);
$layout_name = substr($filename, strlen('template_'));
}

Categories