I am needing PHP code that will parse specific values from a QuickBooks Online (QBO) file, also known as the OFX/QFX file format (http://en.wikipedia.org/wiki/QFX_%28file_format%29).
A section of my sample QBO file that can be used for testing is the following:
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX><SIGNONMSGSRSV1><SONRS><STATUS><CODE>0<SEVERITY>INFO</STATUS><DTSERVER>20150518082838<LANGUAGE>ENG<FI><ORG>Bank Name<FID>1234</FI><INTU.BID>1234<INTU.USERID>123456789012</SONRS></SIGNONMSGSRSV1>
<BANKMSGSRSV1><STMTTRNRS><TRNUID>0<STATUS><CODE>0<SEVERITY>INFO</STATUS><STMTRS><CURDEF>USD<BANKACCTFROM><BANKID>123456789<ACCTID>12345678901<ACCTTYPE>CHECKING</BANKACCTFROM><BANKTRANLIST><DTSTART>20140204235959<DTEND>20150512235959
<STMTTRN><TRNTYPE>DIRECTDEBIT<DTPOSTED>20140204235959<TRNAMT>-000000056.32<FITID>2014000000000000000000000000000000000000000000000000000<NAME>ELECT PWR<MEMO>WEB</STMTTRN>
</BANKTRANLIST><LEDGERBAL><BALAMT>123.45<DTASOF>20150515235959</LEDGERBAL><AVAILBAL><BALAMT>123.45<DTASOF>20150515235959</AVAILBAL></STMTRS>
</STMTTRNRS></BANKMSGSRSV1></OFX>
I am having trouble getting values from an QBO to an array in php. I've looked into various utilities such as QBO2CSV (http://www.propersoft.net/qbo2csv/home) and FixOFX (https://github.com/wesabe/fixofx) and would like to use just PHP code to do this if at all possible. QBO2CSV seemed to almost work if I use a command line to convert the QBO to a CSV and then parse the CSV, but if I could just do this in PHP then I could cut out a few steps.
I also have issue cleaning up the QBO and then using SimpleXMLElement as the QBO files are very non-standard XML and I have been unable to clean it up enough to have SimpleXMLElement accept it as standard XML. The best example I have found for this is at: http://www.ibm.com/developerworks/library/x-ofxv2/
...and it almost works. It is the closest code solution I have found, but it still isn't producing the results. This solution also tries to use SimpleXMLElement after cleaning up the QBO, but it as well has difficulty cleaning up the QBO to become accepted by SimpleXMLElement.
Parts of my attempted solution are below, but I am having difficulty with the XML brackets.
My code:
// READ CONTENTS OF FILE TO STRING
$cont = file_get_contents('C:\xampp\htdocs\testparse\test.qbo');
// STRIP OUT HEADER
$bline = strpos($cont,"<OFX>");
$head = substr($cont,0,$bline-2);
$ofx = substr($cont,$bline-1);
// 3. Examine tags that might be improperly terminated
$ofxx = $ofx;
// NUMBER OF TAGS
$numtags = substr_count($ofxx, '<');
// INIT LOOP
$tagloop = 1;
// PARSE THROUGH TAGS
while ($tagloop <= $numtags){
$tagloop++;
$pos = strpos($ofxx,'<');
$pos2 = strpos($ofxx,'>');
$ele = substr($ofxx,$pos+1,$pos2-$pos-1);
// FIND TAGS AND MAKE SURE THEY ARE IN HTML FORMAT WITH BRACKETS:
$tagstart = "<";
$tagend = ">";
$omittag = $tagstart . $ele . $tagend;
//FIND END OF TAG
$pos3 = strpos($omittag,'>');
$pos4 = $pos3+1;
//STRIP TAG OF ANY REMAINING CHARS AFTER THE ">" CHAR
//FOR SOME REASON OCCASIONALLY THE STRING WOULD BE LONGER THAN INTENDED, SO THIS MAKES SURE IT IS CUT OFF AFTER ">"
$omittag2 = substr($omittag, 0, $pos4);
//REMOVE TAG FROM MAIN STRING
$ofxx = preg_replace($omittag2, '', $ofxx, 1);
// TROUBLES OCCUR HERE...I CAN'T SEEM TO BE ABLE TO GET RID OF EMPTY <> and > CHARS...NOT SURE WHY THEY ARE HERE SINCE THE ABOVE SHOULD HAVE REMOVED ALL OF THE TAG BUT SOMETIMES "<>" OR ">" REMAIN
// WHAT I AM THEN TRYING TO DO IS TO GRAB THIS TAG'S NAME AND THEN EVENTUALLY STORE IT IN AN ARRAY ALONG WITH ITS DATA. SINCE QBOs DO NOT HAVE TERMINATING TAGS THEY EITHER NEED CONVERTING TO SELF TERMINATING TAGS OR JUST TRY AND GRAB A TAG VALUE, AND THE DATA THAT FOLLOWS IT AS THE DATA FOR THAT TAG
//FIND START OF NEXT TAG
$pos5 = strpos($ofxx,'<');
//USE THE START OF POS5 OF THAT TAG TO GRAB DATA FOR THE CURRENT TAG IF POS5 GREATER THAN ZERO
if ($pos5 > 0) {
$tagdata = substr($ofxx, 0, $pos5);
}
// 5. Deal with special characters
$ofxx = str_replace('&','&',$ofxy);
} // END LOOP
I think my biggest issue is dealing with the "<" and ">" characters. I am having trouble removing them as I am going through the string and parsing out the values.
Once I am seeing the correct values echoed, I will then start to add them to an array to then add to a MySQL database.
Related
When "descriptions" field has "enter" (newline) API is failing.
Image to check all parameter sent by users
Below code to get the data from posted JSON.
// get posted data
$jason_value = json_decode(file_get_contents("php://input"));
$crm_id = $jason_value->data->crmId;
$descriptions = $jason_value->data->descriptions;
I would like to accept descriptions as a string one line.
descriptions = "10+ windows modern style 7057655959".
I do not have access to the program where the user enters the description where I can add validation and convert it to \n.
Getting below string after conversion
{ "jwt": "eyJ0", "data": { "crmId": "15876047", "geoconceptAppointmentId": "15876","geoconceptCustomerId": "15876047","status": "Rejected","appointmentDateTime": "","firstName": "Nick Test","lastName": "PA","address": "9112 RUE Tom","city": "MONTREAL","state": "QC","zip": "H2N1T1","country": "CAN","phoneNumber1": "5148332222","phoneNumber2": "5148332222","email": "nbskgg#gmail.com","dateEntry": "2019-06-20 12:02","dateModify": "2019-06-20 12:02","preferredWayToContact": "","textMsgFlag": "Y","hearAboutUs": "Referral","perferredTime": "Anytime","descriptions": "I have to call at 5" pm. ","worklog": "This is the comment ","rejectReason": "Area | Region","referredByDC": "09999","referredByStoreUsername": "store215","assignedUsername": "","createdByUsername": "np","modifiedByUsername": "np","btgMarket": "Montreal"}}
You're correct that in PHP7+, a literal tab or newline is going to cause the json parsing to fail. file_get_contents("php://input") returns a single string so I see no reason why you couldn't just filter that before you attempt to parse it. But maybe I'm missing something.
//Catch Unix OR DOS line endings, but not both
$filter = Array("\n","\n\r");
$replace = " ";
$cleanJSON = str_replace($filter, $replace, file_get_contents("php://input");
$data = json_decode($cleanJSON));
I want to point out that after this point, your code is referencing a variable that does not exist: $jason_value
$crm_id = $jason_value->data->crmId;
$descriptions = $jason_value->data->descriptions;
To reference properties of the object you just created, go directly to $data:
$crm_id = $data->crmId;
$descriptions = $data->descriptions;
I expect that you'd want to replace the newline with a space but you may just want an empty string if what you're actually encountering has a space before the newline but that's impossible to tell from what we have.
I am new to php. As a part of my course homework assignment , I am required to extract data from a website and using that data render a table.
P.S. : Using regex is not a good option but we are not allowed to use any library like DOM, jQuery etc.
Char set is UTF-8.
$searchURL = "http://www.allmusic.com/search/artists/the+beatles";
$html = file_get_contents($searchURL);
$patternform = '/<form(.*)<\/form>/sm';
preg_match_all($patternform ,$html,$matches);
Here regex works fine but when I apply the same regex for table tag, it return me empty array. Is there something to do with whitespaces in $html ?
What is wrong here?
The following code produces a good result:
$searchURL = "http://www.allmusic.com/search/artists/the+beatles";
$html = file_get_contents($searchURL);
$patternform = '/(<table.*<\/table>)/sm';
preg_match_all($patternform ,$html,$matches);
echo $matches[0][0];
Result:
How do I ignore html tags in this preg_replace.
I have a foreach function for a search, so if someone searches for "apple span" the preg_replace also applies a span to the span and the html breaks:
preg_replace("/($keyword)/i","<span class=\"search_hightlight\">$1</span>",$str);
Thanks in advance!
I assume you should make your function based on DOMDocument and DOMXPath rather than using regular expressions. Even those are quite powerful, you run into problems like the one you describe which are not (always) easily and robust to solve with regular expressions.
The general saying is: Don't parse HTML with regular expressions.
It's a good rule to keep in mind and albeit as with any rule, it does not always apply, it's worth to make up one's mind about it.
XPath allows you so find all texts that contain the search terms within texts only, ignoring all XML elements.
Then you only need to wrap those texts into the <span> and you're done.
Edit: Finally some code ;)
First it makes use of xpath to locate elements that contain the search text. My query looks like this, this might be written better, I'm not a super xpath pro:
'//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..'
$search contains the text to search for, not containing any " (quote) character (this would break it, see Cleaning/sanitizing xpath attributes for a workaround if you need quotes).
This query will return all parents that contain textnodes which put together will be a string that contain your search term.
As such a list is not easy to process further as-is, I created a TextRange class that represents a list of DOMText nodes. It is useful to do string-operations on a list of textnodes as if they were one string.
This is the base skeleton of the routine:
$str = '...'; # some XML
$search = 'text that span';
printf("Searching for: (%d) '%s'\n", strlen($search), $search);
$doc = new DOMDocument;
$doc->loadXML($str);
$xp = new DOMXPath($doc);
$anchor = $doc->getElementsByTagName('body')->item(0);
if (!$anchor)
{
throw new Exception('Anchor element not found.');
}
// search elements that contain the search-text
$r = $xp->query('//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..', $anchor);
if (!$r)
{
throw new Exception('XPath failed.');
}
// process search results
foreach($r as $i => $node)
{
$textNodes = $xp->query('.//child::text()', $node);
// extract $search textnode ranges, create fitting nodes if necessary
$range = new TextRange($textNodes);
$ranges = array();
while(FALSE !== $start = strpos($range, $search))
{
$base = $range->split($start);
$range = $base->split(strlen($search));
$ranges[] = $base;
};
// wrap every each matching textnode
foreach($ranges as $range)
{
foreach($range->getNodes() as $node)
{
$span = $doc->createElement('span');
$span->setAttribute('class', 'search_hightlight');
$node = $node->parentNode->replaceChild($span, $node);
$span->appendChild($node);
}
}
}
For my example XML:
<html>
<body>
This is some <span>text</span> that span across a page to search in.
and more text that span</body>
</html>
It produces the following result:
<html>
<body>
This is some <span><span class="search_hightlight">text</span></span><span class="search_hightlight"> that span</span> across a page to search in.
and more <span class="search_hightlight">text that span</span></body>
</html>
This shows that this even allows to find text that is distributed across multiple tags. That's not that easily possible with regular expressions at all.
You find the full code here: http://codepad.viper-7.com/U4bxbe (including the TextRange class that I have taken out of the answers example).
It's not working properly on the viper codepad because of an older LIBXML version that site is using. It works fine for my LIBXML version 20707. I created a related question about this issue: XPath query result order.
A note of warning: This example uses binary string search (strpos) and the related offsets for splitting textnodes with the DOMText::splitText function. That can lead to wrong offsets, as the functions needs the UTF-8 character offset. The correct method is to use mb_strpos to obtain the UTF-8 based value.
The example works anyway because it's only making use of US-ASCII which has the same offsets as UTF-8 for the example-data.
For a real life situation, the $search string should be UTF-8 encoded and mb_strpos should be used instead of strpos:
while(FALSE !== $start = mb_strpos($range, $search, 0, 'UTF-8'))
So I am trying to parse an XML file and display first 150 words of an article with READ MORE link. It doesn't exactly parse 150 words though. I am also not sure how to make it so it does not parse IMG tag code, etc... the code is below
// Script displays 3 most recent blog posts from blog.pinchit.com (blog..pinchit.com/api/read)
// The entries on homepage show the first 150 words of description and "READ MORE" link
// PART 1 - PARSING
// if it was a JSON file
// $string=file_get_contents("http://blog.pinchit.com/api/read");
// $json_a=json_decode($string,true);
// var_export($json_a);
// XML Parsing
$file = "http://blog.pinchit.com/api/read";
$posts_to_display = 3;
$posts = array();
// get all the file nodes
if(!$xml=simplexml_load_file($file)){
trigger_error('Error reading XML file',E_USER_ERROR);
}
// counter for posts member array
$counter = 0;
// Accessing elements within an XML document that contain characters not permitted under PHP's naming convention
// (e.g. the hyphen) can be accomplished by encapsulating the element name within braces and the apostrophe.
foreach($xml->posts->post as $post){
//post's title
$posts[$counter]['title'] = $post->{'regular-title'};
// post's full body
$posts[$counter]['body'] = $post->{'regular-body'};
// post's body's first 150 words
//for some reason, I am not sure if it's exactly 150
$posts[$counter]['preview'] = substr($posts[$counter]['body'], 0, 150);
//strip all the html tags so it doesn't mess up the page
$posts[$counter]['preview'] = strip_tags($posts[$counter]['preview']);
//post's id
$posts[$counter]['id'] = $post->attributes()->id;
$posts_to_display--;
$counter++;
//exit the for loop after we parse out all the articles that we want
if ($posts_to_display == 0 ) break;
}
// Displays all of the posts
foreach($posts as $post){
echo "<b>" . $post['title'] . "</b>";
echo "<br/>";
echo $post['preview'];
echo " <a href='http://blog.pinchit.com/post/" . $post[id] . "'>Read More</a>";
echo "<br/><br/>";
}
Here are how results look now.
Editor's Pick: Club Sportiva
Nothing makes you feel as totally free and in control as a day behind the wheel of a sleek, sophisticated, sexy sports car. It’s no surprise Read More
Pinchy Drinks & Rocks: The Hotel Utah Saloon
Hotel Utah Read More
Monday Menu: Spicy Grapefruit, Paprika, Creamsicles
Feeling summery and savory today, and we have to admit it took a lot to resist the urge to make this an all appetizers, all desserts, or all drinks Read More
The HTML tags are counting against your character total. Strip the tags out first, then take your preview sample:
$preview = strip_tags($posts[$counter]['body']);
$posts[$counter]['preview'] = substr($preview, 0, 150).'...';
Also, one usually adds an ellipse ("...") to the end of truncated text to indicate that it continues.
Note that this has the potential disadvantage of removing tags you DO want, like <p> and <br>. If you want to preserve those, you can pass them as the second argument for strip_tags:
$preview = strip_tags($posts[$counter]['body'], '<br><p>');
$posts[$counter]['preview'] = substr($preview, 0, 150).'...';
BUT, be forewarned that XML-style tags might throw this off (<br />). If you're dealing with XML/HTML mixed, you might need to elevate your tag filtering using something like htmLawed, but the concept remains the same - get rid of the HTML before you truncate.
Looking at the tag <regular-body> it seems to contain HTML. Therefore I would recommend trying to parse that into a DOMDocument ( http://www.php.net/manual/en/domdocument.loadhtml.php ). You then would be able to loop through all the items and ignore certain tags (ex. ignore <img> but keep <p>). After that, you can then render out what you want and truncate it to 150 characters.
I have a file which reads as follows
<<row>> 1|test|20110404<</row>>
<<row>> 1|test|20110404<</row>>
<<row>><</row>> indicates start and end of line.I want to read line between this tags and also check whether this tags are present.
The first thing you need to do is locate the position of this "tag". The strpos() function does just that.
$tag_pos=strpos('<> 1|test|20110404<> <> 1|test|20110404<>', '<>');
if ($tag_pos===false) {
//The tag was not found!
} else {
//$tag_pos equals the numeric position of the first character of your tag
}
If these are truly lines, an efficient way to get them all is just to split on <>.
$lines=explode('<>', '<> 1|test|20110404<> <> 1|test|20110404<>');
$lines=array_filter($lines); //Removes blank strings from array
You could improve this by adding a callback function to the array_filter() call that uses trim() to remove any whitespace and then see if it is blank or not.
Edit: Great, I see that your "tags" were missing from your post. Since your start and end tags do not match, the code above will be of little use to you. Let me try again...
function strbetweenstrs($source, $tag1, $tag2, $casesensitive=true) {
$whatsleft=$source;
while ($whatsleft<>'') {
if ($casesensitive) {
$pos1=strpos($whatsleft, $str1);
$pos2=strpos($whatsleft, $str2, $pos1+strlen($str1));
} else {
$pos1=strpos(strtoupper($whatsleft), strtoupper($str1));
$pos2=strpos(strtoupper($whatsleft), strtoupper($str2), $pos1+strlen($str1));
}
if (($pos1===false) || ($pos2===false)) {
break;
}
array_push($results, substr($whatsleft, $pos1+strlen($str1), $pos2-($pos1_strlen($str1))));
$whatsleft=substr($whatsleft, $pos2+strlen($str2));
}
}
Note that I haven't tested this... but you get the generally idea. There is probably a much more efficient way to go about doing it.
Creating your own format is not so hard, but creating a script to read it can be difficult.
The advantage of using standardized formats is that most programming languages has support for them already. For example:
XML: You can use the simplexml_load_string() function and it can make you navigate easily through your content.
$str = "<?xml version="1.0" encoding="utf-8"?>
<data>
<row>1|test|20110404</row>
<row>1|test|20110404</row>
</data>";
$xml = simplexml_load_string($str);
Now you can access your data
echo $xml->row[0];
echo $xml->row[1];
i'm sure you get the idea,
there is also a very good support for JSON (Javascript Object Notation) using the jsondecode() function;
Check it on php.net for more details
i would suggest to use preg_match :-
preg_match( '#<< row>>(.*)<< /row>>#', $line, $matches);
if( ! empty($matches))
{
// line was found
print_r( $matches[1] ); // will contain the content between the start and end row tags
}