Regular expression works on some servers and not others - php

I'm having trouble debugging a regular expression. First, the code (this is the complete file... sorry for the lack of line breaks -- see http://pastebin.com/h5CeiY5F for a pastebin):
<?php
$matches = null;
$returnValue = preg_match('#" FirstDownType="[A-Z][0-9]+"/><Play PlayDescription="Penalty[^<]+/>#', '<Play DownDistanceYardline="3-1-GB 7" EarnedFirstDown="False" PlayDescription="(3:40) 71-C.Brown reported in as eligible. 28-M.Ingram left guard to GB 7 for no gain (94-J.Wynn; 95-H.Green)."/><Play DownDistanceYardline="4-1-GB 7" EarnedFirstDown="False" PlayDescription="(3:10) 9-D.Brees pass incomplete short left to 23-P.Thomas."/><Play Header="Green Bay Packers at 3:02"/><Play DownDistanceYardline="1-10-GB 7" EarnedFirstDown="False" PlayDescription="(3:02) 44-J.Starks right tackle to GB 11 for 4 yards (94-C.Jordan)."/><Play DownDistanceYardline="2-6-GB 11" EarnedFirstDown="False" PlayDescription="(2:26) 12-A.Rodgers pass deep right to 85-G.Jennings pushed ob at GB 33 for 22 yards (33-J.Greer)." FirstDownType="P17"/><Play PlayDescription="Penalty on NO-33-J.Greer, Defensive Pass Interference, declined."/><Play DownDistanceYardline="1-10-GB 33" EarnedFirstDown="True" PlayDescription="(2:01) (Shotgun) 12-A.Rodgers pass short left to 85-G.Jennings to GB 47 for 14 yards (21-P.Robinson)." FirstDownType="P18"/><Play DownDistanceYardline="1-10-GB 47" EarnedFirstDown="True" PlayDescription="(1:22) 12-A.Rodgers pass short right to 87-J.Nelson pushed ob at NO 44 for 9 yards (27-M.Jenkins)."/><Play DownDistanceYardline="2-1-NO 44" EarnedFirstDown="False" PlayDescription="(:47) 44-J.Starks right tackle to NO 42 for 2 yards (51-J.Vilma; 94-C.Jordan)." FirstDownType="R19"/><Play DownDistanceYardline="1-10-NO 42" EarnedFirstDown="True" PlayDescription="(:07) 25-R.Grant right tackle to NO 40 for 2 yards (51-J.Vilma, 58-S.Shanle)."/><QuarterSummary Team="New Orleans Saints" Score="27" TimeOfPossession="10:47" FirstDownsRushing="3" FirstDownsPassing="5" FirstDownsPenalty="1" FirstDownsTotal="9" ThirdDownEfficiency="1/3" FourthDownEfficiency="0/1"/><QuarterSummary Team="Green Bay Packers" Score="35" TimeOfPossession="4:13" FirstDownsRushing="1" FirstDownsPassing="2" FirstDownsPenalty="0" FirstDownsTotal="3" ThirdDownEfficiency="0/1" FourthDownEfficiency="0/0"/>', $matches);
print_r($matches);
When I run this on several sandboxes (like http://sandbox.onlinephpfunctions.com/ or functions-online.com/preg_match.html ), it returns:
Array ( [0] => " FirstDownType="P17"/><Play PlayDescription="Penalty on NO-33-J.Greer, Defensive Pass Interference, declined."/> )
That's the expected output I'm looking for.
However, when I run it on my server (and I've tested it on two different servers), I get:
Array ( [0] => " FirstDownType="P17"/> )
All I can think of is that preg_match changed between PHP 5.3.10 (the version on the sandbox) and PHP 5.3.6 (our version), or that our Ubuntu version is misconfigured?
I'd really appreciate any help. Thanks!

Do you need to match this using a regex? How about using an XML parser instead?
Try using SimpleXML to get the node(s) you need.
$sXML = new SimpleXMLElement('<xml>'.$xml.'</xml>');
Then you can use XPath to find the element(s) you need.
$play = $sXML->xpath('//Play[starts-with(#PlayDescription, "Penalty")]/preceding-sibling::Play[#FirstDownType]');
This will select the Play element preceding the Play element that has a PlayDescription starting with "Penalty".
DEMO: http://codepad.viper-7.com/ECQKcB

Related

Syntax error while parsing ini file in PHP

I have been trying to look into the error but can't seem to solve it. Can anyone help with this. Thank You.
Warning: syntax error, unexpected '(' in D:\langEn.ini on line 4 in
D:\Xampp\htdocs\PhpProject1\companyinfo.php on line 12
class CompanyInfo
{
function parse_files()
{
//files created for localization
$file1 = "D:\langEn.ini";
$file2 = "D:\jap.ini";
if($file1 == TRUE)
{
print_r(parse_ini_file($file1));
}
else
{
print_r(parse_ini_file($file2));
}
}
}
$obj = new CompanyInfo;
$obj ->parse_files();
Output:
Company Name:
Unikaihatsu Software Private Limited
HO Address(Mumbai):
33-34, Udyog Bhavan, Sonawala Lane,
Goregaon (East), Mumbai, India, PIN 400-063
Phone:+91-22-26867334 Fax:+91-22-26867334
URL: http://www.usindia.com
Branch Office(Ahemdabad):
Unitech Systems
A/410, Mardia Plaza, Near G. L. S. College,
C. G. Road, Ahmedabad, India, PIN 380-006
Phone:+91-79-26461287 Fax:+91-79-40327081
URL: http://www.usindia.com
Branch Office(Indore):
1st Floor, MPSEDC-STP Building,
Electronics Complex,
Pardeshipura, Indore, India, PIN 452010
Phone : +91-731-4075738 Fax : +91-731-4075738
URL : http://www.usindia.com
I think the reason for the error is that you cannot use certain characters in your ini file.
In your case entries such as HO Address(Mumbai) are invalid due to the brackets ( and ).
From the PHP Manual:
Characters ?{}|&~![()^" must not be used anywhere in the key and have a special meaning in the value.
You can read more in the parse_ini_file() documentation.
P.S. The above is the cause of the error you've asked about, but it's not the only problem you will encounter.
You should review your file structure in general, because it does not appear to match the standard ini file format at all. An ini file should generally be in the form
[simple]
val_one=SomeValue
val_two=567
[simple2]
val_three=SomeOtherValue
val_four=890
where [simple] denotes a section, and then val_one, val_two etc. are keys, and SomeValue, 567 etc. are values. Parsing the above using PHP's parse_ini_* functions would produce either
Array
(
[val_one] => SomeValue
[val_two] => 567
[val_three] => SomeOtherValue
[val_four] => 890
)
or
Array
(
[simple] => Array
(
[val_one] => SomeValue
[val_two] => 567
)
[simple2] => Array
(
[val_three] => SomeOtherValue
[val_four] => 890
)
)
depending on whether the $process_sections flag is set false or true. Live demo: https://3v4l.org/4q82F
Also this is slightly odd data to store in an ini file - these files are normally used to store things like application settings, whereas your office addresses are probably more suited to storing in a database (or at the very least in a JSON file), where there would be a) more structure, and b) fewer restrictions on the use of non-alphanumeric characters.

Upgrade to PHP 5.4.29 (from 5.3?) broke my regex

My host just upgrade PHP the 5.4.29 from one of the 5.3 versions, I believe. This broke a very important regular expression that I use in a frequently used program.
I want to match variations on the following (each line is a separate example:
1 / AGGRAVATED ASSAULT Withdrawn 18 § 2702
1 / Simple Assault Guilty Plea 18 § 2701 §§ A
1 / Criminal Mischief Judgment of Acquittal 18 § 3304-12
This is my regex. It has worked for the last 2 years without fail:
/\d\s+\/\s+(.+)\s{12,}(\w.+?)(?=\s\s)\s{12,}(\w{0,2})\s+(\w{1,2}\s?\247\s?\d+(\-|\247|\w+)*)/
I use it as follows:
if (preg_match(self::$chargesSearch2, $line, $matches))
My expectation is that
matches[1] = the charge (Aggravated assault, etc...)
matches[2] = the grading (which often doesn't appear and isn't on any of these examples) matches[3] = the disposition (Withdrawn, etc...)
matches[4] = the code section (18 § 2702)
For some reason it doesn't work now--it doesn't match the lines in question. Does anyone see the error?
While I cannot answer the question of why it doesn't work any more, I can tell you that regex is not the right tool for this job. It's kind of like using a flat-headed screwdriver to drive cross-headed screws. Sure, it'll work, but it's easier to use the right tool.
In this case, you should make some kind of basic parser.
$line = "1 / AGGRAVATED ASSAULT ...";
list($count, $rest) = explode("/", $line, 2);
$count = intval(trim($count));
list($crime, $verdict, $details) = explode(" ",$rest);
$crime = trim($crime);
$verdict = trim($verdict);
$details = trim($details);
// I don't know what the significance of the $details are.
// But using the above you should be able to figure out how to parse it :)

Parsing a very hectic space delimited file

I'm trying to help my dad out -- he gave me an export from a scheduling application at his work. We are trying to see if we can import it into a mysql database so he/co-workers can collaborate online with it.
I've tried a number of different methods but none seem to work right -- and this is not my area of specialties.
Export can be seen here: http://roikingon.com/export.txt
Any help / advice on how to go about parsing this would be greatly appreciated!
Thanks !!
I've made an attempt to write a (somewhat dynamic) fixed-with-column parser. Take a look: http://codepad.org/oAiKD0e7 (it's too long for SO, but it's mostly just "data").
What I've noticed
Text-Data is left aligned with padding on the right like "hello___" (_ = space)
Numerical data is right aligned with padding on the left "___42"
If you want to use my code there's yet stuff to do:
The record types 12.x have variable column count (after some static columns), you'd have to implement another "handler" for it
Some of my width's are most probably wrong. I think there is a system (like numbers are 4 characters long and text 8 characters long, with some variations for special cases). Someone with domain knowledge and more than one sample file could figure out the columns.
Getting the raw-data out is only the first step, you have to map the raw-data to some useful model and write that model to the database.
With that file structure you're basically in need of reverse engineering a proprietary format. Yes, it is space delimited but the format does not follow any kind of standard like CSV, YAML etc. It is completely proprietary with what seems to be a header and separate section with headers of their own.
I think your best bet is to try and see if there's some other type of export that can be done such as Excel or XML and working from there. If there isn't then see if there's an html output of some kind that can be screen scraped, and pasted into Excel and seeing what you get.
Due to everything I mentioned above it will be VERY difficult to massage the file in its current form into something that can be sensibly imported into a database. (Note that from the file structure a number of tables would be needed.)
you can use split with a regular expression (zero or more spaces).
I will try and let you know.
There doesnt seem to be a structure with you data.
$data = "12.1 0 1144713 751 17 Y 8 517 526 537 542 550 556 561 567 17 ";
$arr = preg_split("/ +/", $data);
print_r($arr);
Array
(
[0] => 12.1
[1] => 0
[2] => 1144713
[3] => 751
[4] => 17
[5] => Y
[6] => 8
[7] => 517
[8] => 526
[9] => 537
[10] => 542
[11] => 550
[12] => 556
[13] => 561
[14] => 567
[15] => 17
[16] =>
)
Try this preg_split("/ +/", $data); which splits the line by zero or more spaces, then you will have a nice array, that you can process. But looking at your data, there is no structure, so you will have to know which array element corresponds to what data.
Good luck.
Open it with excel and save it as comma-delimited. Treat consecutive delimiters as one, or not. Then resave it with excel as a csv, which will be comma-separated and easier to import to mysql.
EDIT:
The guy who says to use preg_split on "[ +]" is giving you essentially the same answer as I just did above.
The question is what to do after that, then.
Have you determined yet how many "row types" there are? Once you've determined that and defined their characteristics it will be a lot easier to write some code to go through it.
If you save it in csv, you can use the PHP fgetcsv function and related functions. For each row, you would check it's type and perform operations depending on the type.
I noticed that your data rows could possibly be divided on whether or not the first column's data contains a "." so here's an example of how you might loop through the file.
while($row = fgetcsv($file_handle)) {
if(strpos($row[0],'.') === false) {
// do something
} else {
// do something else
}
}
"do something" would be something like "CREATE TABLE table_$row[0]" or "INSERT INTO table" etc.
Ok, and here's some more observation:
Your file is really like multiple files glued together. It contains multiple formats. Notice all the rows starting with "4" next have a 4-letter company abbreviation followed by full company name. One of them is "caco". If you search for "caco", you find it in multiple "tables" within the file.
I also notice "smuwtfa" (days of the week) sprinkled around.
Use clues like that to determine the logic of how to treat each row.

PHP Library to Generate xdot Files From dot Files

Apologies in advance is I'm misusing terminology, and corrections are appreciated. I'm fascinated by directed graphs, but I never has the math/cs background to know what they're really about, I just like the tech because it makes useful diagrams.
I'm trying to create a web application feature that will render a dynamic directed graph to the browser. I recently discovered Canviz, which is a cavas based xdot renderer, which I'd like to use.
Canviz is awesome, but it renders xdot files, which (appear?) to contain all the complicated positioning logic
/* example xdot file */
digraph abstract {
graph [size="6,6"];
node [label="\N"];
graph [bb="0,0,1250,612",
_draw_="c 9 -#ffffffff C 9 -#ffffffff P 4 0 -1 0 612 1251 612 1251 -1 ",
xdotversion="1.2"];
S1 [pos="464,594", width="0.75", height="0.5", _draw_="c 9 -#000000ff e 464 594 27 18 ", _ldraw_="F 14.000000 11 -Times-Roman c 9 -#000000ff T 464 588 0 15 2 -S1 "];
10 [pos="409,522", width="0.75", height="0.5", _draw_="c 9 -#000000ff e 409 522 27 18 ", _ldraw_="F 14.000000 11 -Times-Roman c 9 -#000000ff T 409 516 0 15 2 -10 "];
S1 -> 10 [pos="e,421.43,538.27 451.52,577.66 444.49,568.46 435.57,556.78 427.71,546.5", _draw_="c 9 -#000000ff B 4 452 578 444 568 436 557 428 546 ", _hdraw_="S 5 -solid c 9 -#000000ff C 9 -#000000ff P 3 430 544 421 538 425 548 "];
}
The files I'm generating with my application are dot files, which contain none of this positioning logic
digraph g {
ranksep=6
node [
fontsize = "16"
shape = "rectangle"
width =3
height =.5
];
edge [
];
S1 -> 10
}
I'm looking for a PHP library that can convert my dot file into an xdot file that can be consumed by Canviz. I realize that the command line program dot can do this, but this is for a redistributable PHP web application, and I'd prefer to avoid any binaries as dependencies.
My core problem: I'm generating dot files based on simple directed relationships, and I want to display the visual graph to end users in a browser. I'd like to do this without having to rely on the presence of a particular binary program on the server. I think the best solution for this is Canviz+PHP to generate xdot files. I'm looking for a PHP library that can do this. However, I'm more than open to other solutions.
Have you looked at Image_GraphViz ? It's really just a wrapper for the binary, but from the look of things, I don't think you'll find something better and this at least keeps you from having to do direct command line calls from your PHP script.
$dot_obj = new Image_GraphViz();
$dot_obj -> load('path/to/graph.gv');
$xdot = $dot_obj -> fetch('xdot');

With PHP filter a textfile into an A-Z listing

I have a text file that reads:
9123 Bellvue Court
5931 Walnut Creek rd.
Andrew
Bailey
Chris
Drew
Earl
Fred
Gerald
Henry
Ida
Jake
Koman
Larry
Manny
Nomar
Omar
Perry
Quest
Raphael
State
Telleman
Uruvian
Vixan
Whales
Xavier
Yellow
Zebra
What I need to do is I need to create a A-Z listing... so:
# A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
and when you click on the letter it will bring up a table with only the words beginning with A's if I clicked A and only the words beginning with numbers if I clicked the # sign.
I was thinking of using a regular expression to accomplish this but I don't want to create 27 different pages. So is there a way to call the letter at the end of the url? like creating something that will do this
http://mywebsite/directory.php?letter=A
A very simple approach:
Read in the text file:
$inputfile = file('words.txt');
Then, AFTER sanitizing the input ($letter = $_GET['letter']), you can build a regex:
$regex = '/^'.$letter.'/i';
and filter the rows you want to show:
$result = preg_grep($regex, $inputfile);
the rest is then simply a matter of outputting nice HTML (or whatever the output shall be)
Keep in mind: When the pages are frequently read, it is a lot faster to have the file stored in a database. You should also take a look into caching mechanisms if load should be a problem at some time in the future
Edit: forgot to mention: To get the # working, you need to add a line along the following:
if ($letter == '#') $letter = '[0-9]';
to get the regex working again.
Yes.
You can access that variable to determine what to sort on by using
$letter = $_GET["letter"]
$arrayCount = preg_match('/^'.$letter."./", $textFileContents, $matches);
Something like that should work
That'd be mad unless you only have a few names in the file.
Unless you have to be terribly dynamic tell Cron to cache 26 text files from your central file each hour/day etc
a.htm etc
Once a day does me, I educated my users to understand that this is how their site would behave.
(A-Z is created from about 10 different applications' content)

Categories