I am parsing an html file. I have a big string which is basically a script.
The string looks likes this:
var spConfig = new
Product.Config({"outofstock":["12663"],"instock":["12654","12655","12656","12657","12658","12659","12660","12661","12662","12664","12665"],"attributes":{"698":{"id":"698","code":"aubade_import_colorcode","label":"Colorcode","options":[{"id":"650","label":"BLUSH","price":"0","products":["12654","12655","12656","12657","12658","12659","12660","12661","12662","12663","12664","12665"]}]},"689":{"id":"689","code":"aubade_import_size_width","label":"Size
Width","options":[{"id":"449","label":"85","price":"0","products":["12654","12657","12660","12663"]},{"id":"450","label":"90","price":"0","products":["12655","12658","12661","12664"]},{"id":"451","label":"95","price":"0","products":["12656","12659","12662","12665"]}]},"702":{"id":"702","code":"aubade_import_size_cup","label":"Size
Cup","options":[{"id":"1501","label":"A","price":"0","products":["12654","12655","12656"]},{"id":"1502","label":"B","price":"0","products":["12657","12658","12659"]},{"id":"1503","label":"C","price":"0","products":["12660","12661","12662"]},{"id":"1504","label":"D","price":"0","products":["12663","12664","12665"]}]}},"template":"\u20ac#{price}","basePrice":"57","oldPrice":"57","productId":"12666","chooseText":"Choose
option...","taxConfig":{"includeTax":true,"showIncludeTax":true,"showBothPrices":false,"defaultTax":19.6,"currentTax":19.6,"inclTaxTitle":"Incl.
Tax"}});
var colorarray = new Array();
colorarray["c650"] = 'blush';
Event.observe('attribute698', 'change', function() {
var colorId = $('attribute698').value;
var attribute = 'attribute698';
var label = colorarray["c"+colorId];
if ($('attribute698').value != '') {
setImages(attribute, colorId, label);
}
}); // var currentColorLabel = 'blush'; // var currentSku = '5010-4-n'; // var currentPosition = 'v'; // //
Event.observe(window, 'load', function() { //
setImages('attribute698', null, currentColorLabel); // });
I need to extract the content from first "(" upto first ";".
I have tried to do string extract and failed.I have tried preg match I have failed.
Kindly tell me some solution to my problem.Below are my tried solution and issues.
$strScript = $tagscript->item(0)->nodeValue;
//this line returns empty string
$str_slashed = addslashes(trim($strScript) );
$pattern = '/\((.*);/';
preg_match($pattern,$str_slashed,$matches);
echo 'matches'."<br />";
var_dump($matches);
//Add slashes works only if I use it before assignment to other string
$matches = array();
$strScript = addslashes ($tagscript->item(0)->nodeValue);//. "<br />";
$pattern = '/\((.*);/';
preg_match($pattern,$strScript,$matches);
echo 'matches'."<br />";
var_dump($matches);
//str extract method
$posBracket = stripos ($strScript,'(');
echo $posBracket."<br />";
$posSemiColon = strpos ($strScript,';');
echo $posSemiColon."<br />";
$temp = mb_substr ($strScript,$posBracket ,($posSemiColon-$posBracket));
echo $temp."<br />";
The above code works for small strings
$strScript = "manisha( [is goo girl] {come(will miss u) \and \"play} ; lets go home;";
but wont work for the long strings.
How can i resolve this issue?Please help me!
You have to add multiline switch to your regular expressions.
Try $pattern = '/\((.*);/s'; or $pattern = '/\((.*);/m';
Try using /\(([^;]*)/ as your pattern. [^;] means any character that is not a ;.
Edit: also turn multiline mode on, as suggested by rogers; therefore the whole pattern should look somewhat like /\(([^;]*)/s.
Edit: you should be aware, that this is not really error-proof. Say, you'll get a ; inside some property of the object of which JSON representation is included in your string.
Related
I have a Text.xml file with some text and the bibliographic references in this text. Its look like this:
Text.xml
<p>…blabla S.King (1987). Bla bla bla J.Doe (2001) blabla bla J.Martin (1995) blabla…</p>
And I have a Reference.txt file with list of bibliographic references and ID number for each reference. Its look like this:
Reference.txt
b1#S.King (1987)
b2#J.Doe (2001)
b3#J.Martin (1995)
I would like to find all bibliographic references from Reference.txt into Text.xml and then add a tag with ID. The goal is TextWithReference.xml who must look like this:
TextWithReference.xml
<p>…blabla <ref type="biblio" target=“b1”>S.King (1987)</ref>. Bla bla bla <ref type="biblio" target=“b2”>J.Doe (2001)</ref> blabla bla <ref type="biblio" target=“b3”>J.Martin (1995)</ref> blabla…</p>
To do this, I use a php file.
Search&Replace.php
<?php
$handle = fopen("Reference.txt","r");
while(!feof($handle))
{
$ligne = fgets($handle,1024);
$tabRef[] = $ligne;
}
fclose($handle);
$handleXML = fopen("Text.xml","r");
$fp = fopen("TextWithReference.xml", "w");
while(!feof($handleXML))
{
$ligneXML = fgets($handleXML,2048);
for($i=0;$i<sizeof($tabRef);$i++)
{
$tabSearch = explode('/#/',$tabRef[$i]);
$xmlID = $tabSearch[0];
$searchString = trim($tabSearch[1]);
if(preg_match('/$searchString/',$ligneXML))
{
$ligneXML = preg_replace('/($searchString)/','/<ref type=\"biblio\" target=\"#$xmlID\">\\0</ref>/',$ligneXML);
}
}
fwrite($fp, $ligneXML);
}
fclose($handleXML);
fclose($fp);
?>
The problem is that this php script just copy Text.xml in TextWithReference.xml without identifing the bibliographic references and without adding the tags…
Many thanks for your help!
There are a number of problems with your code.
The search strings contain characters that are special in regular expressions, such as parentheses. You need to escape these if you want to match them literally. The preg_quote function does this.
Your file-reading loops are not correct. while (!feof()) is not the correct way to read through a file, because the EOF flag isn't set until after you read at the end of the file. So you'll go through the loops an extra time. The proper way to write this is while ($ligne = fgets()).
You have single quotes around the strings where you're trying to substitute $searchString and $xmlID. Variables are only substituted inside double quotes. See What is the difference between single-quoted and double-quoted strings in PHP?
You don't need to put / delimiters around the replacement string in preg_replace.
It's inefficient to explode, trim and escape the lines from the Reference.txt every time you're processing a line in Text.xml. Do it once when you're reading Reference.txt.
In the replacement string, use $0 to replace with the matched text from the source. \0 is an obsolete method that isn't recommended.
You don't need parentheses around the search string in the regexp, since you're not using the $1 capture group in the replacement. And since it's around the whole regexp, it's the same as $0.
Here's the working rewrite:
<?php
$handle = fopen("Reference.txt","r");
$tabRef = array();
while($ligne = trim(fgets($handle,1024))) {
list($xmlID, $searchString) = explode('#', $ligne);
$tabRef[] = array($xmlID, preg_quote($searchString));
}
fclose($handle);
$handleXML = fopen("Text.xml","r");
$fp = fopen("TextWithReference.xml", "w");
while($ligneXML = fgets($handleXML,2048)) {
foreach ($tabRef as $tabSearch) {
$xmlID = $tabSearch[0];
$searchString = $tabSearch[1];
if(preg_match("/$searchString/",$ligneXML)) {
$ligneXML = preg_replace("/$searchString/","<ref type=\"biblio\" target=\"#$xmlID\">$0</ref>",$ligneXML);
}
}
fwrite($fp, $ligneXML);
}
fclose($handleXML);
fclose($fp);
?>
Another improvement takes advantage of the ability to give use arrays as the search and replacement arguments to preg_replace, instead of using a loop. When reading Reference.txt, create the regexp and replacement strings there, and put them each into an array.
<?php
$handle = fopen("Reference.txt","r");
$search = array();
$replacement = array();
while($ligne = trim(fgets($handle,1024))) {
list($xmlID, $searchString) = explode('#', $ligne);
$search[] = "/" . preg_quote($searchString) . "/";
$replacement[] = "<ref type=\"biblio\" target=\"#$xmlID\">$0</ref>";
}
fclose($handle);
$handleXML = fopen("Text.xml","r");
$fp = fopen("TextWithReference.xml", "w");
while($ligneXML = fgets($handleXML,2048)) {
$ligneXML = preg_replace($search,$replacement,$ligneXML);
fwrite($fp, $ligneXML);
}
fclose($handleXML);
fclose($fp);
?>
I have images with names such as:
img-300x300.jpg
img1-250x270.jpg
These names will be stored in a string variable. My image is in Wordpress so it will be located at e.g.
mywebsite.com/wp-content/uploads/2012/11/img-300x300.jpg
and I need the string to be changed to
mywebsite.com/wp-content/uploads/2012/11/img.jpg
I need a PHP regular expression which would return img.jpg and img1.jpg as the names.
How do I do this?
Thanks
Addition
Sorry guys, I had tried this but it didn't work
$string = 'img-300x300.jpg'
$pattern = '[^0-9\.]-[^0-9\.]';
$replacement = '';
echo preg_replace($pattern, $replacement, $string);
You can do this using PHP native functions itself.
<?php
function genLink($imagelink)
{
$img1 = basename($imagelink);
$img = substr($img1,0,strrpos($img1,'-')).substr($img1,strpos($img1,'.'));
$modifiedlink = substr($imagelink,0,strrpos($imagelink,'/'))."/".$img;
return $modifiedlink;
}
echo genLink('mywebsite.com/wp-content/uploads/2012/11/flower-img-color-300x300.jpg');
OUTPUT :
mywebsite.com/wp-content/uploads/2012/11/flower-img-color.jpg
You can do that as:
(img\d*)-([^.]*)(\..*)
and \1\3 will contain what you want:
Demo: http://regex101.com/r/vU2mD4
Or, replace (img\d*)-([^.]*)(\..*) with \1\3
May be this?
(\w+)-[^.]+?(\.\w+)
The $1$2 will give you what you want.
search : \-[^.]+
replace with : ''
(.[^\-]*)(?:.[^\.]*)\.(.*)
group 1 - name before "-"
group 2 - extension. (everything after ".")
As long as there is only one - and one . then explode() should work great for this:
<?php
// array of image names
$images = array();
$images[] = 'img-300x300.jpg';
$images[] = 'img1-250x270.jpg';
// array to store new image names
$new_names = array();
// loop through images
foreach($images as $v)
{
// explode on dashes
// so we would have something like:
// $explode1[0] = 'img';
// $explode1[1] = '300x300.jpg';
$explode1 = explode('-',$v);
// explode the second piece on the period
// so we have:
// $explode2[0] = '300x300';
// $explode2[1] = 'jpg';
$explode2 = explode('.',$explode1[1]);
// now bring it all together
// this translates to
// img.jpg and img1.jpg
$new_names[] = $explode1[0].'.'.$explode2[1];
}
echo '<pre>'.print_r($new_names, true).'</pre>';
?>
That's an interesting question, and since you are using php, it can be nicely solved with a branch reset (a feature of Perl, PCRE and a few other engines).
Search: img(?|(\d+)-\d{3}x\d{3}|-\d{3}x\d{3})\.jpg
Replace: img\1.jpg
The benefit of this solution, compared with a vague replacement, is that we are sure that we are matching a file whose name matches the format you specified.
I have a string in php that looks like this
$(window).load(function(){
$('.someclass').click(function () {
$(this).text("clicked");
});
});
what i want is - if string contains $(window).load(function(){ then replace this and also the end braces }); with empty string ""
But if $(window).load(function(){ do not exist then do nothing.
Here is what i have tried:
if(strpos($str,"$(window).load(function(){") == -1){
// do nothing
}
else{
str_replace("$(window).load(function(){","",$str);
// how do i replace the last }); with ""
}
If your code is nicely indented like that, this might just work for you:
$str = <<<EOM
$(window).load(function(){
$('.someclass').click(function () {
$(this).text("clicked");
});
});
EOM;
$start = preg_quote('$(window).load(function(){');
$end = preg_quote('});');
$new = preg_replace("/^$start\s*(.*?)^$end/ms", '$1', $str);
print_r($new);
You will need regular expressions for this one if you can guarantee that the }); will be the last one. If so:
$str = preg_replace("#\$\(window\)\.load\(function\(\) \{(.*)\}\);#is","",trim($str));
Should do the trick.
If you cannot guarantee that the }); you want to replace will be the last occurence, you will have to walk through your code and count the braces. No other way, sadly :-(
$str = substr($str, 0, strlen($str) - 4);
This will remove the last 3 characters of the string.
Find the position of the last occurrence with strrpos ? Then maybe do a str_replace from that point with a limit of 1? You should check the modified string with an external call to something like JSlint to make sure you didnt create malformed code.
I think, a working way will be just to test for (window).load, and to add this :
str_replace('$(window).load', "var functionOnLoad = ", $str);
Don't forget to add a call to this function if you want it to be execute. Somethink like :
str_replace('</script>', "functionOnLoad();</script>", $str);
I'm trying to write a page scraping script to take a currency of a site. I need some help writing the regular expression.
Here is what I have so far.
<?php
function converter(){
// Create DOM from URL or file
$html = file_get_contents("http://www.bloomberg.com/personal- finance/calculators/currency-converter/");
// Find currencies. ( using h1 to test)
preg_match('/<h1>(.*)<\/h1>/i', $html, $title);
$title_out = $title[1];
echo $title_out;
}
$foo = converter();
echo $foo;
?>
Here is where the currencies are kept on the Bloomberg site.
site: http://www.bloomberg.com/personal-finance/calculators/currency-converter/
//<![CDATA[
var test_obj = new Object();
var price = new Object();
price['ADP:CUR'] = 125.376;
What would the expression look like to get that rate?
Any help would be great!!
This works for me - does it need to be more flexible? And does it need to take various whitespace - or is it alway exactly one space? (around the equal sign)
"/price\['ADP:CUR'\] = (\d+\.\d+/)"
Usage:
if(preg_match("/price\['ADP:CUR'\] = (\d+\.\d+)/", $YOUR_HTML, $m)) {
//Result is in $m[1]
} else {
//Not found
}
there you go:
/ADP:CUR[^=]*=\s*(.*?);/i
This returns an associate array identical to the javascript object on the bloomberg site.
<?php
$data = file_get_contents('http://www.bloomberg.com/personal-finance/calculators/currency-converter/');
$expression = '/price\\[\'(.*?)\'\\]\\s+=\\s+([+-]?\\d*\\.\\d+)(?![-+0-9\\.]);/';
preg_match_all($expression, $data, $matches);
$array = array_combine($matches[1], $matches[2]);
print_r($array);
echo $array['ADP:CUR'];// string(7) "125.376"
?>
http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567
http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567%2Fsubtitle
http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F123456
http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F123456%2Fsubtitle
http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567%2F
http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F123456%2F
The URL's always start with:
http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F
The ids are always numeric, however the number of digits can vary.
How to get the id (1234567 and 123456) from above sample URL's?
I've tried using the following pattern without luck (it doesn't return any matches):
/^http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F(\d)$/
I would recommend you to first parse this url and extract the url query string parameter and url decoding it:
function getParameterByName(url, name)
{
name = name.replace(/[\[]/, "\\\[").replace(/[\]]/, "\\\]");
var regexS = "[\\?&]" + name + "=([^&#]*)";
var regex = new RegExp(regexS);
var results = regex.exec(url);
if(results == null)
return "";
else
return decodeURIComponent(results[1].replace(/\+/g, " "));
}
like this:
var url = 'http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567';
var p = getParameterByName(url, 'url');
and then use some regex to parse p and extract the necessary information like /\d+/.
With proper URL parsing functions you can do this:
parse_str(parse_url($url, PHP_URL_QUERY), $params);
if (isset($params['url'])) {
parse_str(parse_url($params['url'], PHP_URL_QUERY), $params);
if (isset($params['movie'])) {
$movie = $params['movie'];
}
}
$urls = array(
'http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567'
, 'http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567%2Fsubtitle'
, 'http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F123456'
, 'http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F123456%2Fsubtitle'
, 'http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567%2F'
, 'http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F123456%2F'
);
foreach ($urls as $url) {
if (preg_match('/%2Fmovie%2F(\d+)/', $url, $matches)) {
var_dump($matches[1]);
}
}
KISS. I was originally going to use parse_url(), but there is no way to parse a query string without regular expressions anyway.
There's a way without parsing too. Assuming $url = URL
http://codepad.org/t91DK9H2
$url = "http://example.com/movie.swf?url=http%3A%2F%2Fexample.com%2Fmovie%2F1234567%2Fsubtitle";
$reg = "/^([\w\d\.:]+).*movie%2F(\d+).*/";
$id = preg_replace($reg,"$2",$url);
It looks likes you need to escape some special characters.
try:
/^http://example.com/movie.swf\?url=http%3A%2F%2Fexample.com%2Fmovie%2F(\d+)$/