Forgive me as I am a newbie programmer. How can I assign the resulting $matches (preg_match) value, with the first character stripped, to another variable ($funded) in php? You can see what I have below:
<?php
$content = file_get_contents("https://join.app.net");
//echo $content;
preg_match_all ("/<div class=\"stat-number\">([^`]*?)<\/div>/", $content, $matches);
//testing the array $matches
//echo sprintf('<pre>%s</pre>', print_r($matches, true));
$funded = $matches[0][1];
echo substr($funded, 1);
?>
Don't parse HTML with RegEx.
The best way is to use PHP DOM:
<?php
$handle = curl_init('https://join.app.net');
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
$raw = curl_exec($handle);
curl_close($handle);
$doc = new DOMDocument();
$doc->loadHTML($raw);
$elems = $doc->getElementsByTagName('div');
foreach($elems as $item) {
if($item->getAttribute('class') == 'stat-number')
if(strpos($item->textContent, '$') !== false) $funded = $item->textContent;
}
// Remove $ sign and ,
$funded = preg_replace('/[^0-9]/', '', $funded);
echo $funded;
?>
This returned 380950 at the time of posting.
I am not 100% sure but it seems like you are trying to get the dollar amount that the funding is currently ?
And the character is a dollar sign that you want to strip out ?
If that is the case why not just add the dollar sign to the regex outside the group so it isn't captured.
/<div class=\"stat-number\">\$([^`]*?)<\/div>/
Because $ means end of line in regex you must first escape it with a slash.
Related
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 8 years ago.
Please help me check this code. I think my regex wrote has a problem but I don't know how to fix it:
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$content = get_data('http://ibongda.vn/lich-thi-dau-bong-da.hs');
$regex = '/<div id="zone-schedule-group-by-season">(.*)<\/div>/';
preg_match($regex, $content, $matches);
$table = $matches[1];
print_r($table);
I would advise against using regular expression for this. You should use DOM for this task.
The problem with your regular expression is running into newline sequences, it will match until the < in </div>, continuously keep backtracking and fail. Backtracking is what regular expressions do during the course of matching when a match fails. You need to use the s (dotall) modifier which forces the dot to match newlines as well.
$regex = '~<div id="zone-schedule-group-by-season">(.*?)</div>~s';
I suggest don't use regex to parse these. You can use an HTML Parser, DOMDocument with xpath in particular.
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$content = get_data('http://ibongda.vn/lich-thi-dau-bong-da.hs');
$dom = new DOMDocument();
libxml_use_internal_errors(true); // handle errors yourself
$dom->loadHTML($content);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$table_rows = $xpath->query('//div[#id="zone-schedule-group-by-season"]/table/tbody/tr[#class!="bg-gd" and #class!="table-title"]'); // these are the rows of that table
foreach($table_rows as $rows) { // loop each tr
foreach($rows->childNodes as $td) { // loop each td
if(trim($td->nodeValue) != '') { // don't show empty td
echo trim($td->nodeValue) . '<br/>';
}
}
echo '<hr/>';
}
i want a regex to find out the below lines from a set of codes.
The part that i want to find:---
-->Copy frame link\",\"url240\":\"http:\/\/cs534515v4.vk.me\/u163220668\/videos\/1c1b06aec9.240.mp4\",\"url360\":\"http:\/\/cs534515v4.vk.me\/u163220668\/videos\/1c1b06aec9.360.mp4\",\"jpg\"<--
This code form part if an html page and i want to retrieve only the part shown.I am writing the codes in php
My complete codes.....
<?php
set_time_limit(0);
function get_content_of_url($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
$plyst = get_content_of_url("http://vk.com/video56612186_167113956");
preg_match('/link\\".*"jpg\\"/', $plyst , $matches);
var_dump($matches);
//preg_match('/http:\/\/[a-zA-Z0-9\\/-_.]+/', $matches[0][0], $id);
//start_script($id[0]);
?>
How about this.
$str = "video_get_current_url\":\"Copy frame link\",\"url240\":\"http:\\\/\\\/cs534515v4.vk.me\\\/u163220668\\\/videos\\\/1c1b06aec9.240.mp4\",\"url360\":\"http:\\\/\\\/cs534515v4.vk.me\\\/u163220668\\\/videos\\\/1c1b06aec9.360.mp4\",\"jpg\":\"http:\\\/\\\/cs534515.vk.me\\\/u163220668\\\/video\\\/l_8a5b0712.jpg\",\"ip_subm\":1,\"nologo";
preg_match('/\\"Copy\sframe.*"jpg\\"/is', $str, $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
string(199) ""Copy frame link","url240":"http:\\/\\/cs534515v4.vk.me\\/u163220668\\/videos\\/1c1b06aec9.240.mp4","url360":"http:\\/\\/cs534515v4.vk.me\\/u163220668\\/videos\\/1c1b06aec9.360.mp4","jpg""
}
Edit:
And then, if you wanted to extract the video url's from that:
preg_match_all('/(https?:.*?\.mp4)/', $matches[0], $id);
//Then echo out the url's
foreach ($id[0] as $url) {
// the preg_replace strips out the double backslashes.
echo preg_replace('/\\\\/', '', $url)."<br />";
}
Output:
http://cs534515v4.vk.me/u163220668/videos/1c1b06aec9.240.mp4
http://cs534515v4.vk.me/u163220668/videos/1c1b06aec9.360.mp4
Working example: http://sandbox.onlinephpfunctions.com/code/329106d990fe8927a7670b9448770643afbd0865
I wanted to remove all occurrences of specific pattern of a parameter from a URL using preg_expression. Also removing the last "&" if exist
The pattern looks like: make=xy ("make" is fixed; "xy" can be any two letters)
Example:
http://example.com/index.php?c=y&make=yu&do=ms&r=k&p=7&
After processing preg_replace, the outcome should be:
http://example.com/index.php?c=y&do=ms&r=k&p=7
I tried using:
$url = "index.php?ok=no&make=ae&make=as&something=no&make=gr";
$url = preg_replace('/(&?lang=..&?)/i', '', $url);
However, this did not work well because I have duplicates of make=xx in the URL (which is a case that could happen in my app).
You don't need RegEx for this:
$url = "http://example.com/index.php?ok=no&make=ae&make=as&something=no&make=gr&";
list($file, $parameters) = explode('?', $url);
parse_str($parameters, $output);
unset($output['make']); // remove the make parameter
$result = $file . '?' . http_build_query($output); // Rebuild the url
echo $result; // http://example.com/index.php?ok=no&something=no
You could try using:
$str = parse_url($url, PHP_URL_QUERY);
$query = array();
parse_str($str, $query);
var_dump($query);
This will return to you the query as an array. You could then use http_build_query() function to restore the array in a query string.
But if you want to use regexp:
$url = "index.php?make=ae&ok=no&make=ae&make=as&something=no&make=gr";
echo $url."\n";
$url = preg_replace('/\b([&|&]{0,1}make=[^&]*)\b/i','',$url);
$url = str_replace('?&','?',$url);
echo $url;
This will remove all make in the URL
with rtrim you can remove last &
$url = rtrim("http://example.com/index.php?c=y&make=yu&do=ms&r=k&p=7&","&");
$url = preg_replace('~&make=([a-z\-]*)~si', '', $url);
$url = "index.php?ok=no&make=ae&make=as&something=no&make=gr";
$url = preg_replace('/(&?make=[a-z]{2})/i', '', $url);
echo $url;
Just by using preg_replace
$x = "http://example.com/index.php?c1=y&make=yu&do1=ms&r1=k&p1=7&";
$x = preg_replace(['/(\?make=[a-z]*[&])/i', '/(\&make=[a-z]*[^(&*)])/i', '/&(?!\w)/i'], ['?','',''], $x);
echo $x;
And the result is: http://example.com/index.php?c1=y&do1=ms&r1=k&p1=7
Hope this will be helpful to you guys.
I'm trying to replace some URLs in a database (wordpress) with another, but it's tricky because a lot of the URLs are redirects. I'm trying to either replace the URL with the redirected URL, or with a URL of my choosing, based on the result. I can get the matching done without any problems, but I can't replace it. I've tried str_replace, but it doesn't seem to replace the URLs. When I try preg_replace, it will give "Warning: preg_replace(): Delimiter must not be alphanumeric or backslash". Can anyone point me in the right way to do this?
if(preg_match($url_regex,$row['post_content'])){
preg_match_all($url_regex,$row['post_content'],$matches);
foreach($matches[0] as $match){
echo "{$row['ID']} \t{$row['post_date']} \t{$row['post_title']}\t{$row['guid']}";
$newUrl = NULL;
if(stripos($url_regex,'domain1') !== false || stripos($url_regex,'domain2') !== false || stripos($url_regex,'domain3') !== false){
$match = str_replace('&','&',$match);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$match);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
$newUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
if(stripos($newUrl,'domain4') !== false)
$newUrl = NULL;
}
else
if($newUrl == NULL)
{ $newUrl = 'http://www.mysite.com/';
}
echo "\t$match\t$newUrl";
$content = str_replace($match,$newUrl,$row['post_content']);
echo "\t (" . strlen($content).")";
echo "\n";
}
}
This is how you would do it with Perl Regular Expressions.
$baesUrlMappings = array('/www.yoursite.com/i' => 'www.mysite.com',
'/www.yoursite2.com/i' => 'www.mysite2.com',);
echo preg_replace(array_keys($baesUrlMappings), array_values($baesUrlMappings), 'http://www.yoursite.com/foo/bar?id=123');
echo preg_replace(array_keys($baesUrlMappings), array_values($baesUrlMappings), 'http://www.yoursite2.com/foo/bar?id=123');
http://codepad.viper-7.com/2ne7u6
Please read the manual! You should be able to figure this out.
Update Yahoo error
Ok, so I got it all working, but the preg_match_all wont work towards Yahoo.
If you take a look at:
http://se.search.yahoo.com/search?p=random&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t
then you can see that in their html, they have
<span class="url" id="something random"> the actual link </span>
But when I try to preg_match_all, I wont get any result.
preg_match_all('#<span class="url" id="(.*)">(.+?)</span>#si', $urlContents[2], $yahoo);
Anyone got an idea?
End of update
I'm trying to preg_match_all the results i get from Google using a cURL curl_multi_getcontent method.
I have succeeded to fetch the site and so, but when I'm trying to get the result of the links, it just takes too much.
I'm currently using:
preg_match_all('#<cite>(.+)</cite>#si', $urlContents[0], $links);
And that starts where it should be, but it doesn't stop, it just keeps going.
Check the HTML at www.google.com/search?q=random for example and you will see that all links start with and ends with .
Could someone possible help me with how I should retreive this information?
I only need the actual link address to each result.
Update Entire PHP Script
public function multiSearch($question)
{
$sites['google'] = "http://www.google.com/search?q={$question}&gl=sv";
$sites['bing'] = "http://www.bing.com/search?q={$question}";
$sites['yahoo'] = "http://se.search.yahoo.com/search?p={$question}";
$urlHandler = array();
foreach($sites as $site)
{
$handler = curl_init();
curl_setopt($handler, CURLOPT_URL, $site);
curl_setopt($handler, CURLOPT_HEADER, 0);
curl_setopt($handler, CURLOPT_RETURNTRANSFER, 1);
array_push($urlHandler, $handler);
}
$multiHandler = curl_multi_init();
foreach($urlHandler as $key => $url)
{
curl_multi_add_handle($multiHandler, $url);
}
$running = null;
do
{
curl_multi_exec($multiHandler, $running);
}
while($running > 0);
$urlContents = array();
foreach($urlHandler as $key => $url)
{
$urlContents[$key] = curl_multi_getcontent($url);
}
foreach($urlHandler as $key => $url)
{
curl_multi_remove_handle($multiHandler, $url);
}
foreach($urlContents as $urlContent)
{
preg_match_all('/<li class="g">(.*?)<\/li>/si', $urlContent, $matches);
//$this->view_data['results'][] = "Random";
}
preg_match_all('#<div id="search"(.*)</ol></div>#i', $urlContents[0], $match);
preg_match_all('#<cite>(.+)</cite>#si', $urlContents[0], $links);
var_dump($links);
}
run the regular expression in U-ngready mode
preg_match_all('#<cite>(.+)</cite>#siU
As in Darhazer's answer you can turn on ungreedy mode for the whole regex using the U pattern modifier, or just make the pattern itself ungreedy (or lazy) by following it with a ?:
preg_match_all('#<cite>(.+?)</cite>#si', ...