i have aproblem for a few days right now :s ...
I'm trying to get some changing data inside a string, the string is something like this:
<docdata>
<!-- News Identifier -->
<doc-id id-string ="YBY15349" />
<!-- Date of issue -->
<date.issue norm ="2012-09-22 19:52" />
<!-- Date of release -->
<date.release norm ="2012-09-22 19:52" />
</docdata>
What i need is only the date inside the "2012-09-22 19:52" , the string its stored in some type of xml, malformed by the way. So i can't use normal xml parser, i load/read the file already to change some charset
$fname = $file;
$fhandle = fopen($fname,"r");
$content = fread($fhandle,filesize($fname));
str_replace("<?xml version=\"1.0\" encoding=\"UTF-8\"?>", "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>", $content);
etc..
this work like a charm, but with the string i cant use it.
I try with preg_match_all but i can`t get it right.
Its there a simple way to search this value
<date.issue norm ="2012-09-22 19:52" />
and get just the date in a variable?
thanks in advance and sorry for my english.
From the PHP documentation:
file_get_contents() is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by your OS to enhance performance.
Consequently, your code would become:
$content = file_get_contents($file);
$content = str_replace("<?xml version=\"1.0\" encoding=\"UTF-8\"?>", "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>", $content);
preg_match_all('/date\.issue norm ="([^"]+)" /', $content, $date);
The default behavior is to store the parenthesized matches in the array $date[1]. Therefore, you might loop through $date[1][0], $date[1][1], and so on.
A regular expression to match the following:
<date.issue norm ="2012-09-22 19:52" />
Would be:
/<date\.issue\s*norm\s*="([^"]*)"/
In code:
preg_match_all('/<date\.issue\s*norm\s*="([^"]*)"/', $content, $matches);
// $matches[1] contains all the dates
Instead of using
fopen($filename)
use
$filename = '/path/to/file.xml';
$filearray = file($filename) // pulls the while file into an array by lines
$searchstr = 'date.issue';
foreach($filearray as $line) {
if(stristr($line,$searchstr)) { // <-- forgot the )
$linearray = explode('"',$line);
// your date should be $linearray[1];
echo $linearray[1]."\n"; // to test your output
// rest of your code here
}
}
this way you search the whole file for your search string and the malformed xml shouldnt be a problem.
Related
I have an xml string. That xml string has to be converted into PHP array in order to be processed by other parts of software my team is working on.
For xml -> array conversion i'm using something like this:
if(get_class($xmlString) != 'SimpleXMLElement') {
$xml = simplexml_load_string($xmlString);
}
if(!$xml) {
return false;
}
It works fine - most of the time :) The problem arises when my "xmlString" contains something like this:
<Line0 User="-5" ID="7436194"><Node0 Key="<1" Value="0"></Node0></Line0>
Then, simplexml_load_string won't do it's job (and i know that's because of character "<").
As i can't influence any other part of the code (i can't open up a module that's generating XML string and tell it "encode special characters, please!") i need your suggestions on how to fix that problem BEFORE calling "simplexml_load_string".
Do you have some ideas? I've tried
str_replace("<","<",$xmlString)
but, that simply ruins entire "xmlString"... :(
Well, then you can just replace the special characters in the $xmlString to the HTML entity counterparts using htmlspecialchars() and preg_replace_callback().
I know this is not performance friendly, but it does the job :)
<?php
$xmlString = '<Line0 User="-5" ID="7436194"><Node0 Key="<1" Value="0"></Node0></Line0>';
$xmlString = preg_replace_callback('~(?:").*?(?:")~',
function ($matches) {
return htmlspecialchars($matches[0], ENT_NOQUOTES);
},
$xmlString
);
header('Content-Type: text/plain');
echo $xmlString; // you will see the special characters are converted to HTML entities :)
echo PHP_EOL . PHP_EOL; // tidy :)
$xmlobj = simplexml_load_string($xmlString);
var_dump($xmlobj);
?>
I am getting text between two tags with PHP (from a HTML).
a sample code i use is this :
function GDes($url) {
$fp = file_get_contents($url);
if (!$fp) return false;
$res = preg_match("/<description>(.*)<\/description>/siU", $fp, $title_matches);
if (!$res) return false;
$description = preg_replace('/\s+/', ' ', $title_matches[1]);
$description = trim($description);
return $description;
}
It gives between the description tags, But my problem is that if the page have to description tags, it will give the first one that i don't need it.
I need to get the second one.
For example, If my HTML is this :
<description>No need to this</description>
<description>I NEED THIS ONE</description>
I need to give the second description tag with that function above.
What changes the function needed ?
Use preg_match_all instead. It will create an array with all matches.
You can keep your code as is, just replace preg_match with preg_match_all.
Then you have to use $title_matches[1][1] instead of $title_matches[1] in your preg_replace call, since the $title_matches is now a multidimensional array.
I don't know if this is the right way to go about it, but right now I am dealing with a very large text file of membership details. It is really inconsistent though, but typically conforming to this format:
Name
School
Department
Address
Phone
Email
&&^ (indicating the end of the individual record)
What I want to do with this information is read through it, and then format it into XML.
So right now I have a foreach reading through the long file like this:
<?php
$textline = file("asrlist.txt");
foreach($textline as $showline){
echo $showline . "<br>";
}
?>
And that's where I don't know how to continue. Can anybody give me some hints on how I could organize these records into XML?
Here a straightforward solution using simplexml:
$members = explode('&&^', $textline); // building array $members
$xml = new SimpleXMLElement("<?xml version="1.0" encoding="UTF-8"?><members></members>");
$fieldnames = array('name','school','department','address','phone','email');
// set $fieldsep to character(s) that seperate fields from each other in your textfile
$fieldsep = '\p\n'; // a wild guess...
foreach ($members as $member) {
$m = explode($fieldsep, $member); // build array $m; $m[0] would contain "name" etc.
$xmlmember = $xml->addChild('member');
foreach ($m as $key => $data)
$xmlmember->addChild($fieldnames[$key],$data);
} // foreach $members
$xml->asXML('mymembers.xml');
For reading and parsing the text-file, CSV-related functions could be a good alternative, as mentioned by other users.
To read big files you can use fgetcsv
If && works as a delimiter for records in that file, you could start with replacing it with </member><member>. Prepend whole file with <member> and append </member> at the end. You will have something XML alike.
How to replace?
You might find unix tools like sed useful.
sed 's/&&/\<\/member\>\<member\>/' <input.txt >output.xml
You can also accomplish it with PHP, using str_replace():
foreach($textline as $showline){
echo str_replace( '&&', '</member><member>', $showline ) . "<br>";
}
this is an original ofx file as it comes from m bank (no worries, theres nothing sensitive, i cut out the middle part with all the transactions)
Open Financial Exchange (OFX) is a
data-stream format for exchanging
financial information that evolved
from Microsoft's Open Financial
Connectivity (OFC) and Intuit's Open
Exchange file formats.
now i need to parse this. i already saw that question, but this is not a dup because i am interested in how to do this.
i am sure i could figure out some clever regexps that would do the job, but that is ugly and error vulnerable (if the format is changed, some fields may be missing, the formatting/white spaces are different etc etc...)
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20110420000000[+1:CET]
<LANGUAGE>ENG
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>1
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<STMTRS>
<CURDEF>EUR
<BANKACCTFROM>
<BANKID>20404
<ACCTID>02608983629
<ACCTTYPE>CHECKING
</BANKACCTFROM>
<BANKTRANLIST>
<DTSTART>20110207
<DTEND>20110419
<STMTTRN>
<TRNTYPE>XFER
<DTPOSTED>20110205000000[+1:CET]
<TRNAMT>-6.12
<FITID>C74BD430D5FF2521
<NAME>unbekannt
<MEMO>BILLA DANKT 1265P K2 05.02.UM 17.49
</STMTTRN>
<STMTTRN>
<TRNTYPE>XFER
<DTPOSTED>20110207000000[+1:CET]
<TRNAMT>-10.00
<FITID>C74BE0F90A657901
<NAME>unbekannt
<MEMO>AUTOMAT 13177 KARTE2 07.02.UM 10:22
</STMTTRN>
............................. goes on like this ........................
<STMTTRN>
<TRNTYPE>XFER
<DTPOSTED>20110418000000[+1:CET]
<TRNAMT>-9.45
<FITID>C7A5071492D14D29
<NAME>unbekannt
<MEMO>HOFER DANKT 0408P K2 18.04.UM 18.47
</STMTTRN>
</BANKTRANLIST>
<LEDGERBAL>
<BALAMT>1992.29
<DTASOF>20110420000000[+1:CET]
</LEDGERBAL>
</STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>
i currently use this code which gives me the desired result:
<?
$files = array();
$files[] = '***_2011001.ofx';
$files[] = '***_2011002.ofx';
$files[] = '***_2011003.ofx';
system('touch file.csv && chmod 777 file.csv');
$fp = fopen('file.csv', 'w');
foreach($files as $file) {
echo $file."...\n";
$content = file_get_contents($file);
$content = str_replace("\n","",$content);
$content = str_replace(" ","",$content);
$regex = '|<STMTTRN><TRNTYPE>(.+?)<DTPOSTED>(.+?)<TRNAMT>(.+?)<FITID>(.+?)<NAME>(.+?)<MEMO>(.+?)</STMTTRN>|';
echo preg_match_all($regex,$content,$matches,PREG_SET_ORDER)." matches... \n";
foreach($matches as $match) {
echo ".";
array_shift($match);
fputcsv($fp, $match);
}
echo "\n";
}
echo "done.\n";
fclose($fp);
this is really ugly and if this was a valid xml file i would personally kill myself for that, but how to do it better?
Your code seems fine, considering that the file isn't XML or even SGML. The only thing you could do is try to make a more generic SAX-like parser. That is, you simply go through the input stream one block at a time (where block can be anything, e.g. a line or simply a set amount of characters). Then, call a callback function every time you encounter an <ELEMENT>. You can even go as fanciful as building a parser class where you can register callback functions that listen to specific elements.
It will be more generic and less "ugly" (for some definition of "ugly") but it will be more code to maintain. Nice to do and nice to have if you need to parse this file format a lot (or in a lot of different variations). If your posted code is the only place you do this then just KISS.
// Load Data String
$str = file_get_contents($fLoc);
$MArr = array(); // Final assembled master array
// Fetch all transactions
preg_match_all("/<STMTTRN>(.*)<\/STMTTRN>/msU",$str,$m);
if ( !empty($m[1]) ) {
$recArr = $m[1]; unset($str,$m);
// Parse each transaction record
foreach ( $recArr as $i => $str ) {
$_arr = array();
preg_match_all("/(^\s*<(?'key'.*)>(?'val'.*)\s*$)/m",$str,$m);
foreach ( $m["key"] as $i => $key ) {
$_arr[$key] = trim($m["val"][$i]); // Reassemble array key => val
}
array_push($MArr,$_arr);
}
}
print_r($MArr);
function close_tags($x)
{
return preg_replace('/<([A-Za-z0-9.]+)>([^<\r\n]+)/', '<\1>\2</\1>', $x);
}
$ofx = file_get_contents('myfile.ofx');
$body = '<OFX>'.explode('<OFX>', $ofx)[1]; // strip the header
$xml = close_tags($body); // make valid XML
$reader = new SimpleXMLElement($xml);
foreach($reader->xpath('//STMTTRN') as $txn): // find and loop through all STMTTRN tags, note the double forward slash
// get the tag contents by casting as (string) to invoke the SimpleXMLElement::__toString() method
$trntype = (string)$txn->TRNTYPE;
$dtposted = (string)$txn->DTPOSTED;
$trnamt = (string)$txn->TRNAMT;
$name = (string)$xn->NAME;
$memo = (string)$txn->MEMO;
endforeach;
I am returned the following:
<links>
<image_link>http://img357.imageshack.us/img357/9606/48444016.jpg</image_link>
<thumb_link>http://img357.imageshack.us/img357/9606/48444016.th.jpg</thumb_link>
<ad_link>http://img357.imageshack.us/my.php?image=48444016.jpg</ad_link>
<thumb_exists>yes</thumb_exists>
<total_raters>0</total_raters>
<ave_rating>0.0</ave_rating>
<image_location>img357/9606/48444016.jpg</image_location>
<thumb_location>img357/9606/48444016.th.jpg</thumb_location>
<server>img357</server>
<image_name>48444016.jpg</image_name>
<done_page>http://img357.imageshack.us/content.php?page=done&l=img357/9606/48444016.jpg</done_page>
<resolution>800x600</resolution>
<filesize>38477</filesize>
<image_class>r</image_class>
</links>
I wish to extract the image_link in PHP as simply and as easily as possible. How can I do this?
Assume, I can not make use of any extra libs/plugins for PHP. :)
Thanks all
At Josh's answer, the problem was not escaping the "/" character. So the code Josh submitted would become:
$text = 'string_input';
preg_match('/<image_link>([^<]+)<\/image_link>/gi', $text, $regs);
$result = $regs[0];
Taking usoban's answer, an example would be:
<?php
// Load the file into $content
$xml = new SimpleXMLElement($content) or die('Error creating a SimpleXML instance');
$imagelink = (string) $xml->image_link; // This is the image link
?>
I recommend using SimpleXML because it's very easy and, as usoban said, it's builtin, that means that it doesn't need external libraries in any way.
You can use SimpleXML as it is built in PHP.
use regular expressions
$text = 'string_input';
preg_match('/<image_link>([^<]+)</image_link>/gi', $text, $regs);
$result = $regs[0];