Related
I have to find a php code to solve a math problem.
This is the problem description:
Players A and B are playing a new game of stones. There are N stones
placed on the ground, forming a sequence. The stones are labeled from
1 to N. Players A and B play in turns take exactly two consecutive stones
on the ground until there are no consecutive stones on the ground.
That is, each player can take stone i and stone i+1, where 1≤i≤N−1. If
the number of stone left is odd, A wins. Otherwise, B wins. Assume
both A and B play optimally and A plays first, do you know who the
winner is?
The line has N stones and are indexed from 1 to N --> N (1 ≤
N ≤ 10 000 000)
If the number of stone left is odd, A wins. Otherwise, B wins.
This is my code. It does work, but it is not correct.
<?php
$nStones = rand(1, 10000000);
$string = ("i");
$start = rand(1, 10000000);
$length = 2;
while($nStones > 0) {
substr( $nStones , $start [, $length ]): string;
}
if ($nStones % 2 == 1) {
echo "A";
} else {
echo "B";
}
?>
I think am missing the alternant subtraction of two consecutive stones by A & B, while $nStones > 0. Furthermore, the problem description mentions an optima subtraction until there is only one stone left. Therefore I guess the stones move together to their closest stones (the gaps disappear and are replaced by the closest stones).
I've made a start here:
<?php
class GameOfStones
{
const STONE_PAIR = 'OO';
const GAP_PAIR = '__';
public $line;
public function __construct($length)
{
$this->line = str_pad('', $length, self::STONE_PAIR);
}
// Removes a pair of stones from the line at nth location.
public function remove($n)
{
if(substr($this->line, $n-1, 2) == self::STONE_PAIR)
$this->line =
substr_replace($this->line, self::GAP_PAIR , $n-1, 2);
else
throw new Exception('Invalid move.');
}
// Check if there are no further possible moves.
public function is_finished()
{
return strpos($this->line, self::STONE_PAIR) === false;
}
// Representation of line.
public function __toString()
{
return implode('.', str_split($this->line)) ."\n";
}
};
$game = new GameOfStones(6);
echo $game;
var_dump($game->is_finished());
$game->remove(5);
echo $game;
var_dump($game->is_finished());
$game->remove(2);
echo $game;
var_dump($game->is_finished());
Output:
O.O.O.O.O.O
bool(false)
O.O.O.O._._
bool(false)
O._._.O._._
bool(true)
Currently this class starts by making a line which is a string of 'O' characters.
So if the length was 5, the line would be a string like this:
OOOOO
The remove method takes an index. If that index was 1, first the line is checked at the string's 0 index (your n-1) for two consecutive O's. In other words 'are there stones to remove at a given position?'. If there are stones, we do a string replacement at that position, and swap the two Os for two _s.
The is_finished method checks the line for the first occurance of two Os. In other words if there are two consecutive stones there is still a move on the line to play.
The magic method __toString, is the string representation of a GameOfStones object. That's used as a way to visualise the state of the game.
O.O.O.O._._
The above shows four stones and two gaps (I'm not sure if the dot separators are necessary - the underscores can bleed into each other that's why I've used them).
I have added example use of the code, where (two) pairs of stones are removed from a line of six stones. After each removal we check if there is another possible move, or rather if the game has ended.
There is no player attribution currently, that's left to you.
Your last rule:
'If the number of stone left is odd, A wins. Otherwise, B wins.'
I am struggling with. See these examples:
i) Line of length 3:
OOO
O__ A (1)
End: one (odd) stone left.
ii) Line of length 4:
OOOO
OO__ A (3)
____ B (1)
End: zero (even) stones left.
ii) Line of length 7:
OOOOOOO
O__OOOO A(1)
O__O__O B(5)
End: three (odd) stones left.
I'd say that the person that removes the pair so the next player can't go is the winner. In game ii) above if A had played at position 1 (O__O), then they would prevent B from playing.
I am trying to generate some SKU numbers and I came to an issue that has made me think, and as I slept less than 2 hours I decided to ask you guys, Stackoverflowers.
Let's say I've got an array of the alphabet excluding commonly mistaken letters.
$alphabet = array("A","C","D","E","F","G","H","I","J","K","L","M","N","P","Q","R","S","T","U","V","W","X","Y","Z");
I am trying to generate 2 letters based on consistent number. Let's say I've got sub-products that I want to have that suffix in the end of their SKU. For the first sub-product the SKU will have suffix - AA, for the 24th - AZ, 25th - CA, 26th - CC and so on. The thing is that we don't want to have repeating suffixes, but AC and CA are acceptable.
Thank you for doing the dirty job for a sleep needing programmer.
Making it clear:
I want to get a combination based on irritation. Let's say:
$i = 1, then $suffix = AA;
$i = 2, then $suffix = AC;
$i = 24, then $suffix = ZZ;
$i = 25 (one above the count of the array), then $suffix = CA;
$i = 26, then $suffix = CC;
$i = 49, then $suffix = DA (**I suppose**)
Let's say I have sub-products for product 1 and sub-products for product 2. Product 1's sub-products' suffixes should be:
AA, AC, AD, AE .... AZ, CA, CC, CD .... CZ .... ZA, ZC ... ZY.
Product 2's sub-products' suffixes can also be the same!
I would use the product number to pick two indexes from the list of available letters The following can be done in one line, but I expand it so I can explain what it does.
function get2letter($num)
{
$alphabet = array("A","C","D","E","F","G","H","I","J","K","L","M","N","P","Q","R","S","T","U","V","W","X","Y","Z");
$code = $alphabet[$num%sizeof($alphabet)]; // Get the first letter
$num = floor($num/sizeof($alphabet)); // Remove the value used to get the first letter
$code.= $alphabet[$num%sizeof($alphabet)]; // Get the second letter
return $code;
}
for($i=0; $i<10; $i++)
print get2letter($i)."\n";
This will work for small values. You have collisions when you surpass the number of unique values you can represent with your alphabet.
I think that's not a good conception because you will be limited in the time with this solution.. what I mean is that it's not unlimited.
If you really want to do this, I think you have to construct a kind of array with all the solution and index it with a number and when you create a new product, just know the number of product you have and take the next one.
php: sort and count instances of words in a given string
In this article, I have know how to count instances of words in a given string and sort by frequency. Now I want make a further work, match the result words into anther array ($keywords), then only get the top 5 words. But I do not know how to do that, open a question. thanks.
$txt = <<<EOT
The 2013 Monaco Grand Prix (formally known as the Grand Prix de Monaco 2013) was a Formula One motor race that took place on 26 May 2013 at the Circuit de Monaco, a street circuit that runs through the principality of Monaco. The race was won by Nico Rosberg for Mercedes AMG Petronas, repeating the feat of his father Keke Rosberg in the 1983 race. The race was the sixth round of the 2013 season, and marked the seventy-second time the Monaco Grand Prix has been held. Rosberg had started the race from pole.
Background
Mercedes protest
Just before the race, Red Bull and Ferrari filed an official protest against Mercedes, having learned on the night before the race of a three-day tyre test undertaken by Pirelli at the venue of the last grand prix using Mercedes' car driven by both Hamilton and Rosberg. They claimed this violated the rule against in-season testing and gave Mercedes a competitive advantage in both the Monaco race and the next race, which would both be using the tyre that was tested (with Pirelli having been criticised following some tyre failures earlier in the season, the tests had been conducted on an improved design planned to be introduced two races after Monaco). Mercedes stated the FIA had approved the test. Pirelli cited their contract with the FIA which allows limited testing, but Red Bull and Ferrari argued this must only be with a car at least two years old. It was the second test conducted by Pirelli in the season, the first having been between race 4 and 5, but using a 2011 Ferrari car.[4]
Tyres
Tyre supplier Pirelli brought its yellow-banded soft compound tyre as the harder "prime" tyre and the red-banded super-soft compound tyre as the softer "option" tyre, just as they did the previous two years. It was the second time in the season that the super-soft compound was used at a race weekend, as was the case with the soft tyre compound.
EOT;
$words = array_count_values(str_word_count($txt, 1));
arsort($words);
var_dump($words);
$keywords = array("Monaco","Prix","2013","season","Formula","race","motor","street","Ferrari","Mercedes","Hamilton","Rosberg","Tyre");
//var_dump($words) which should match in $keywords array, then get top 5 words.
You already have $words as an associative array, indexed by the word and with the count as the value, so we use array_flip() to make your $keywords array an associative array indexed by word as well. Then we can use array_intersect_key() to return only those entries from $words that have a matching index entry in our flipped $keywords array.
This gives a resulting $matchWords array, still keyed by the word, but containing only those entries from the original $words array that match $keywords; and still sorted by frequency.
We then simply use array_slice() to extract the first 5 entries from that array.
$matchWords = array_intersect_key(
$words,
array_flip($keywords)
);
$matchWords = array_slice($matchWords, 0, 5);
var_dump($matchWords);
gives
array(5) {
'race' =>
int(11)
'Monaco' =>
int(7)
'Mercedes' =>
int(5)
'Rosberg' =>
int(4)
'season' =>
int(4)
}
Caveat: You could have problems with case-sensitivity. "Race" !== "race", so the $words = array_count_values(str_word_count($txt, 1)); line will treat these as two different words.
I am creating a Bible search. The trouble with bible searches is that people often enter different kinds of searches, and I need to split them up accordingly. So i figured the best way to start out would be to remove all spaces, and work through the string there. Different types of searches could be:
Genesis 1:1 - Genesis Chapter 1, Verse 1
1 Kings 2:5 - 1 Kings Chapter 2, Verse 5
Job 3 - Job Chapter 3
Romans 8:1-7 - Romans Chapter 8 Verses 1 to 7
1 John 5:6-11 - 1 John Chapter 5 Verses 6 - 11.
I am not too phased by the different types of searches, But If anyone can find a simpler way to do this or know's of a great way to do this then please tell me how!
Thanks
The easiest thing to do here is to write a regular expression to capture the text, then parse out the captures to see what you got. To start, lets assume you have your test bench:
$tests = array(
'Genesis 1:1' => 'Genesis Chapter 1, Verse 1',
'1 Kings 2:5' => '1 Kings Chapter 2, Verse 5',
'Job 3' => 'Job Chapter 3',
'Romans 8:1-7' => 'Romans Chapter 8, Verses 1 to 7',
'1 John 5:6-11' => '1 John Chapter 5, Verses 6 to 11'
);
So, you have, from left to right:
A book name, optionally prefixed with a number
A chapter number
A verse number, optional, optionally followed by a range.
So, we can write a regex to match all of those cases:
((?:\d+\s)?\w+)\s+(\d+)(?::(\d+(?:-\d+)?))?
And now see what we get back from the regex:
foreach( $tests as $test => $answer) {
// Match the regex against the test case
preg_match( $regex, $test, $match);
// Ignore the first entry, the 2nd and 3rd entries hold the book and chapter
list( , $book, $chapter) = array_map( 'trim', $match);
$output = "$book Chapter $chapter";
// If the fourth match exists, we have a verse entry
if( isset( $match[3])) {
// If there is no dash, it's a single verse
if( strpos( $match[3], '-') === false) {
$output .= ", Verse " . $match[3];
} else {
// Otherwise it's a range of verses
list( $start, $end) = explode( '-', $match[3]);
$output .= ", Verses $start to $end";
}
}
// Here $output matches the value in $answer from our test cases
echo $answer . "\n" . $output . "\n\n";
}
You can see it working in this demo.
I think I understand what you are asking here. You want to devise an algorithm that extracts information (ex. book name, chapter, verse/verses).
This looks to me like a job for pattern matching (ex. regular expressions) because you could then define patterns, extract data for all scenario's that make sense and work from there.
There are actually quite a few variants that could exist - perhaps you should also take a look at natural language processing. Fuzzy string matching on names could provide better results (ex. people misspelling book names).
Best of luck
Try out something based on preg_match_all, like:
$ php -a
Interactive shell
php > $s = '1 kings 2:4 and 1 sam 4-5';
php > preg_match_all("/(\\d*|[^\\d ]*| *)/", $s, $parts);
php > print serialize($s);
Okay Well I am not too sure about regular expressions and I havent yet studied them out, So I am stuck with the more procedural approach. I have made the following (which is still a huge improvement on the code I wrote 5 years ago, which was what I was aiming to achieve) That seems to work flawlessly:
You need this function first of all:
function varType($str) {
if(is_numeric($str)) {return false;}
if(is_string($str)) {return true;}
}
$bible = array("BookNumber" => "", "Book" => "", "Chapter" => "", "StartVerse" => "", "EndVerse" => "");
$pos = 1; // 1 - Book Number
// 2 - Book
// 3 - Chapter
// 4 - ':' or 'v'
// 5 - StartVerse
// 6 - is a dash for spanning verses '-'
// 7 - EndVerse
$scan = ""; $compile = array();
//Divide into character type groups.
for($x=0;$x<=(strlen($collapse)-1);$x++)
{ if($x>=1) {if(varType($collapse[$x]) != varType($collapse[$x-1])) {array_push($compile,$scan);$scan = "";}}
$scan .= $collapse[$x];
if($x==strlen($collapse)-1) {array_push($compile,$scan);}
}
//If the first element is not a number, then it is not a numbered book (AKA 1 John, 2 Kings), So move the position forward.
if(varType($compile[0])) {$pos=2;}
foreach($compile as $val)
{ if(!varType($val))
{ switch($pos)
{ case 1: $bible['BookNumber'] = $val; break;
case 3: $bible['Chapter'] = $val; break;
case 5: $bible['StartVerse'] = $val; break;
case 7: $bible['EndVerse'] = $val; break;
}
} else {switch($pos)
{ case 2: $bible['Book'] = $val; break;
case 4: //Colon or 'v'
case 6: break; //Dash for verse spanning.
}}
$pos++;
}
This will give you an array called 'Bible' at the end that will have all the necessary data within to run on an SQL database or whatever else you might want it for. Hope this helps others.
I know this is crazy talk, but why not just have a form with 4 fields so they can specify:
Book
Chapter
Starting Verse
Ending Verse [optional]
Good morning -
I'm interested in seeing an efficient way of parsing the values of an heirarchical text file (i.e., one that has a Title => Multiple Headings => Multiple Subheadings => Multiple Keys => Multiple Values) into a simple XML document. For the sake of simplicity, the answer would be written using:
Regex (preferrably in PHP)
or, PHP code (e.g., if looping were more efficient)
Here's an example of an Inventory file I'm working with. Note that Header = FOODS, Sub-Header = Type (A, B...), Keys = PRODUCT (or CODE, etc.) and Values may have one more more lines.
**FOODS - TYPE A**
___________________________________
**PRODUCT**
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;
2) La Fe String Cheese
**CODE**
Sell by date going back to February 1, 2009
**MANUFACTURER**
Quesos Mi Pueblito, LLC, Passaic, NJ.
**VOLUME OF UNITS**
11,000 boxes
**DISTRIBUTION**
NJ, NY, DE, MD, CT, VA
___________________________________
**PRODUCT**
1) Peanut Brittle No Sugar Added;
2) Peanut Brittle Small Grind;
3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating
**CODE**
1) Lots 7109 - 8350 inclusive;
2) Lots 8198 - 8330 inclusive;
3) Lots 7075 - 9012 inclusive;
4) Lots 7100 - 8057 inclusive;
5) Lots 7152 - 8364 inclusive
**MANUFACTURER**
Star Kay White, Inc., Congers, NY.
**VOLUME OF UNITS**
5,749 units
**DISTRIBUTION**
NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN
**FOODS - TYPE B**
___________________________________
**PRODUCT**
Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice;
**CODE**
990-10/2 10/5
**MANUFACTURER**
San Mar Manufacturing Corp., Catano, PR.
**VOLUME OF UNITS**
384
**DISTRIBUTION**
PR
And here's the desired output (please excuse any XML syntactical errors):
<foods>
<food type = "A" >
<product>Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese</product>
<product>La Fe String Cheese</product>
<code>Sell by date going back to February 1, 2009</code>
<manufacturer>Quesos Mi Pueblito, LLC, Passaic, NJ.</manufacturer>
<volume>11,000 boxes</volume>
<distibution>NJ, NY, DE, MD, CT, VA</distribution>
</food>
<food type = "A" >
<product>Peanut Brittle No Sugar Added</product>
<product>Peanut Brittle Small Grind</product>
<product>Homestyle Peanut Brittle Nuggets/Coconut Oil Coating</product>
<code>Lots 7109 - 8350 inclusive</code>
<code>Lots 8198 - 8330 inclusive</code>
<code>Lots 7075 - 9012 inclusive</code>
<code>Lots 7100 - 8057 inclusive</code>
<code>Lots 7152 - 8364 inclusive</code>
<manufacturer>Star Kay White, Inc., Congers, NY.</manufacturer>
<volume>5,749 units</volume>
<distibution>NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN</distribution>
</food>
<food type = "B" >
<product>Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice</product>
<code>990-10/2 10/5</code>
<manufacturer>San Mar Manufacturing Corp., Catano, PR</manufacturer>
<volume>384</volume>
<distibution>PR</distribution>
</food>
</FOODS>
<!-- and so forth -->
So far, my approach (which might be quite inefficient with a huge text file) would be one of the following:
Loops and multiple Select/Case statements, where the file is loaded into a string buffer, and while looping through each line, see if it matches one of the header/subheader/key lines, append the appropriate xml tag to a xml string variable, and then add the child nodes to the xml based on IF statements regarding which key name is most recent (which seems time-consuming and error-prone, esp. if the text changes even slightly) -- OR
Use REGEX (Regular Expressions) to find and replace key fields with appropriate xml tags, clean it up with an xml library, and export the xml file. Problem is, I barely use regular expressions, so I'd need some example-based help.
Any help or advice would be appreciated.
Thanks.
An example you can use as a starting point. At least I hope it gives you an idea...
<?php
define('TYPE_HEADER', 1);
define('TYPE_KEY', 2);
define('TYPE_DELIMETER', 3);
define('TYPE_VALUE', 4);
$datafile = 'data.txt';
$fp = fopen($datafile, 'rb') or die('!fopen');
// stores (the first) {header} in 'name' and the root simplexmlelement in 'element'
$container = array('name'=>null, 'element'=>null);
// stores the name for each item element, the value for the type attribute for subsequent item elements and the simplexmlelement of the current item element
$item = array('name'=>null, 'type'=>null, 'current_element'=>null);
// the last **key** encountered, used to create new child elements in the current item element when a value is encountered
$key = null;
while ( false!==($t=getstruct($fp)) ) {
switch( $t[0] ) {
case TYPE_HEADER:
if ( is_null($container['element']) ) {
// this is the first time we hit **header - subheader**
$container['name'] = $t[1][0];
// ugly hack, < . name . />
$container['element'] = new SimpleXMLElement('<'.$container['name'].'/>');
// each subsequent new item gets the new subheader as type attribute
$item['type'] = $t[1][1];
// dummy implementation: "deducting" the item names from header/container[name]
$item['name'] = substr($t[1][0], 0, -1);
}
else {
// hitting **header - subheader** the (second, third, nth) time
/*
header must be the same as the first time (stored in container['name']).
Otherwise you need another container element since
xml documents can only have one root element
*/
if ( $container['name'] !== $t[1][0] ) {
echo $container['name'], "!==", $t[1][0], "\n";
die('format error');
}
else {
// subheader may have changed, store it for future item elements
$item['type'] = $t[1][1];
}
}
break;
case TYPE_DELIMETER:
assert( !is_null($container['element']) );
assert( !is_null($item['name']) );
assert( !is_null($item['type']) );
/* that's maybe not a wise choice.
You might want to check the complete item before appending it to the document.
But the example is a hack anyway ...so create a new item element and append it to the container right away
*/
$item['current_element'] = $container['element']->addChild($item['name']);
// set the type-attribute according to the last **header - subheader** encountered
$item['current_element']['type'] = $item['type'];
break;
case TYPE_KEY:
$key = $t[1][0];
break;
case TYPE_VALUE:
assert( !is_null($item['current_element']) );
assert( !is_null($key) );
// this is a value belonging to the "last" key encountered
// create a new "key" element with the value as content
// and addit to the current item element
$tmp = $item['current_element']->addChild($key, $t[1][0]);
break;
default:
die('unknown token');
}
}
if ( !is_null($container['element']) ) {
$doc = dom_import_simplexml($container['element']);
$doc = $doc->ownerDocument;
$doc->formatOutput = true;
echo $doc->saveXML();
}
die;
/*
Take a look at gettoken() at http://www.tuxradar.com/practicalphp/21/5/6
It breaks the stream into much simpler pieces.
In the next step the parser would "combine" or structure the simple tokens into more complex things.
This function does both....
#return array(id, array(parameter)
*/
function getstruct($fp) {
if ( feof($fp) ) {
return false;
}
// shortcut: all we care about "happens" on one line
// so let php read one line in a single step and then do the pattern matching
$line = trim(fgets($fp));
// this matches **key** and **header - subheader**
if ( preg_match('#^\*\*([^-]+)(?:-(.*))?\*\*$#', $line, $m) ) {
// only for **header - subheader** $m[2] is set.
if ( isset($m[2]) ) {
return array(TYPE_HEADER, array(trim($m[1]), trim($m[2])));
}
else {
return array(TYPE_KEY, array($m[1]));
}
}
// this matches _____________ and means "new item"
else if ( preg_match('#^_+$#', $line, $m) ) {
return array(TYPE_DELIMETER, array());
}
// any other non-empty line is a single value
else if ( preg_match('#\S#', $line) ) {
// you might want to filter the 1),2),3) part out here
// could also be two diffrent token types
return array(TYPE_VALUE, array($line));
}
else {
// skip empty lines, would be nicer with tail-recursion...
return getstruct($fp);
}
}
prints
<?xml version="1.0"?>
<FOODS>
<FOOD type="TYPE A">
<PRODUCT>1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;</PRODUCT>
<PRODUCT>2) La Fe String Cheese</PRODUCT>
<CODE>Sell by date going back to February 1, 2009</CODE>
<MANUFACTURER>Quesos Mi Pueblito, LLC, Passaic, NJ.</MANUFACTURER>
<VOLUME OF UNITS>11,000 boxes</VOLUME OF UNITS>
<DISTRIBUTION>NJ, NY, DE, MD, CT, VA</DISTRIBUTION>
</FOOD>
<FOOD type="TYPE A">
<PRODUCT>1) Peanut Brittle No Sugar Added;</PRODUCT>
<PRODUCT>2) Peanut Brittle Small Grind;</PRODUCT>
<PRODUCT>3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating</PRODUCT>
<CODE>1) Lots 7109 - 8350 inclusive;</CODE>
<CODE>2) Lots 8198 - 8330 inclusive;</CODE>
<CODE>3) Lots 7075 - 9012 inclusive;</CODE>
<CODE>4) Lots 7100 - 8057 inclusive;</CODE>
<CODE>5) Lots 7152 - 8364 inclusive</CODE>
<MANUFACTURER>Star Kay White, Inc., Congers, NY.</MANUFACTURER>
<VOLUME OF UNITS>5,749 units</VOLUME OF UNITS>
<DISTRIBUTION>NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN</DISTRIBUTION>
</FOOD>
<FOOD type="TYPE B">
<PRODUCT>Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice;</PRODUCT>
<CODE>990-10/2 10/5</CODE>
<MANUFACTURER>San Mar Manufacturing Corp., Catano, PR.</MANUFACTURER>
<VOLUME OF UNITS>384</VOLUME OF UNITS>
<DISTRIBUTION>PR</DISTRIBUTION>
</FOOD>
</FOODS>
Unfortunately the status of the php module for ANTLR currently is "Runtime is in alpha status." but it might be worth a try anyway...
See: http://www.tuxradar.com/practicalphp/21/5/6
This tells you how to parse a text file into tokens using PHP. Once parsed you can place it into anything you want.
You need to search for specific tokens in the file based on your criteria:
for example:
PRODUCT
This gives you the XML Tag
Then 1) can have special meaning
1) Peanut Brittle...
This tells you what to put in the XML tag.
I do not know if this is the most efficient way to accomplish your task but it is the way a compiler would parse a file and has the potential to make very accurate.
Instead of Regex or PHP use the XSLT 2.0 unparsed-text() function to read the file (see http://www.biglist.com/lists/xsl-list/archives/200508/msg00085.html)
Another Hint for an XSLT 1.0 Solution is here: http://bytes.com/topic/net/answers/808619-read-plain-file-xslt-1-0-a