I had a php file already using regex to extract m3u8 link from youtube, which was working fine until last week.
http://server.com/youtube.php?id=youtbueid
use to pass the youtube id like this.
$string = get_data('https://www.youtube.com/watch?v=' . $channelid);
if(preg_match('#"hlsManifestUrl.":."(.*?m3u8)#', $string, $match)) {
$var1=$match[1];
$var1=str_replace("\/", "/", $var1);
$man = get_data($var1);
//echo $man;
preg_match_all('/(https:\/.*\/95\/.*index.m3u8)/U',$man,$matches, PREG_PATTERN_ORDER);
$var2=$matches[1][0];
header("Content-type: application/vnd.apple.mpegurl");
header("Location: $var2");
}
else {
preg_match_all('#itag.":([^,]+),."url.":."(.*?).".*?qualityLabel.":."(.*?)p."#', $string, $match);
//preg_match_all('#itag.":([^,]+),."url.":."(.*?).".*?bitrate.":.([^,]+),#', $string, $match);
$filter_keys = array_filter($match[3], function($element) {
return $element <= 720;
});
//print_r($filter_keys);
$max_key = array_keys($filter_keys, max($filter_keys))[0];
//print_r($max_key);
$urls = $match[2];
foreach($urls as &$url) {
$url = str_replace('\/', '/', $url);
$url = str_replace('\\\u0026', '&', $url);
}
print_r($urls[$max_key]);
header('location: ' . $urls[$max_key]);
How do I solve this problem?
Based on this post, I'm guessing that the desired URLs might look like:
and we can write a simple expression such as:
(.+\?v=)(.+)
We can also add more boundaries to it, if it was necessary.
RegEx
If this expression wasn't desired, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
PHP Test
$re = '/(.+\?v=)(.+)/m';
$str = ' https://www.youtube.com/watch?v=_Gtc-GtLlTk';
$subst = '$2';
$result = preg_replace($re, $subst, $str);
echo $result;
JavaScript Demo
This snippet shows that we likely have a valid expression:
const regex = /(.+\?v=)(.+)/gm;
const str = ` https://www.youtube.com/watch?v=_Gtc-GtLlTk`;
const subst = `$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Related
I'm using the code given on this page to look through a string and turn the URL into an HTML link.
It works quite well, but there is a little issue with the "replace" part of it.
The problem occurs when I have almost identical links. For example:
https://example.com/page.php?goto=200
and
https://example.com/page.php
Everything will be fine with the first link, but the second will create a <a> tag in the first <a> tag.
First run
https://example.com/page.php?goto=200
Second
https://example.com/page.php?goto=200">https://example.com/page.php?goto=200</a>
Because it's also replacing the html link just created.
How do I avoid this?
<?php
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
?>
You need to add a whitespace identifier \s in your regex at the start, also remove \b because \b only returns the last match.
You regex can written as:
$reg_exUrl = "/(?i)\s((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/"
check this one: https://regex101.com/r/YFQPlZ/1
I have change the replace part a bit, since I couldn't get the suggested regex to work.
Maybe it can be done better, but I'm still learning :)
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $key => $newLinks){
if(strstr( $newLinks, ":" ) === false){
$url = 'https://'.$newLinks;
}else{
$url = $newLinks;
}
// Create Search and Replace strings
$replace .= ''.$url.',';
$newLinks = '/'.preg_quote($newLinks, '/').'/';
$string = preg_replace($newLinks, '{'.$key.'}', $string, 1);
}
$arr_replace = explode(',', $replace);
foreach ($arr_replace as $key => $link) {
$string = str_replace('{'.$key.'}', $link, $string);
}
}
//Return result
return $string;
}
I'm trying to get all images links with preg_match_all those that begin with http://i.ebayimg.com/ and ends with .jpg , from page that I'm scraping.. I Can not do it correctly... :( I tried this but this is not what i need...:
preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $contentas, $img_link);
Same problem is with normal links... I don't know how to write preg_match_all to this:
<a class="link--muted" href="http://suchen.mobile.de/fahrzeuge/details.html?id=218930381&daysAfterCreation=7&isSearchRequest=true&withImage=true&scopeId=C&categories=Limousine&damageUnrepaired=NO_DAMAGE_UNREPAIRED&zipcode=&fuels=DIESEL&ambitCountry=DE&maxPrice=11000&minFirstRegistrationDate=2006-01-01&makeModelVariant1.makeId=3500&makeModelVariant1.modelId=20&pageNumber=1" data-touch="hover" data-touch-wrapper=".cBox-body--resultitem">
Thank you very much!!!
UPDATE
I'm trying from here:
http://suchen.mobile.de/fahrzeuge/search.html?isSearchRequest=true&scopeId=C&makeModelVariant1.makeId=1900&makeModelVariant1.modelId=10&makeModelVariant1.modelDescription=&makeModelVariantExclusions%5B0%5D.makeId=&categories=Limousine&minSeats=&maxSeats=&doorCount=&minFirstRegistrationDate=2006-01-01&maxFirstRegistrationDate=&minMileage=&maxMileage=&minPrice=&maxPrice=11000&minPowerAsArray=&maxPowerAsArray=&maxPowerAsArray=PS&minPowerAsArray=PS&fuels=DIESEL&minCubicCapacity=&maxCubicCapacity=&ambitCountry=DE&zipcode=&q=&climatisation=&airbag=&daysAfterCreation=7&withImage=true&adLimitation=&export=&vatable=&maxConsumptionCombined=&emissionClass=&emissionsSticker=&damageUnrepaired=NO_DAMAGE_UNREPAIRED&numberOfPreviousOwners=&minHu=&usedCarSeals= get cars links and image links and all information, with information is everything fine, my script works good, but i have problem with scraping images and links.. here is my script :
<?php
$id= $_GET['id'];
$user= $_GET['user'];
$login=$_COOKIE['login'];
$query = mysql_query("SELECT pavadinimas,nuoroda,kuras,data,data_new from mobile where vartotojas='$user' and id='$id'");
$rezultatas=mysql_fetch_row($query);
$url = "$rezultatas[1]";
$info = file_get_contents($url);
function scrape_between($data, $start, $end){
$data = stristr($data, $start);
$data = substr($data, strlen($start));
$stop = stripos($data, $end);
$data = substr($data, 0, $stop);
return $data;
}
//turinio iskirpimas
$turinys = scrape_between($info, '<div class="g-col-9">', '<footer class="footer">');
//filtravimas naikinami mokami top skelbimai
$contentas = preg_replace('/<div class="cBox-body cBox-body--topResultitem".*?>(.*?)<\/div>/', '' ,$turinys);
//filtravimas baigtas
preg_match_all('/<span class="h3".*?>(.*?)<\/span>/',$contentas,$pavadinimas);
preg_match_all('/<span class="u-block u-pad-top-9 rbt-onlineSince".*?>(.*?)<\/span>/',$contentas,$data);
preg_match_all('/<span class="u-block u-pad-top-9".*?>(.*?)<\/span>/',$contentas,$miestas);
preg_match_all('/<span class="h3 u-block".*?>(.*?)<\/span>/', $contentas, $kaina);
preg_match_all('/<a[A-z0-9-_:="\.\/ ]+href="(http:\/\/suchen.mobile.de\/fahrzeuge\/[^"]*)"[A-z0-9-_:="\.\/ ]\s*>\s*<div/s', $contentas, $matches);
print_r($pavadinimas);
print_r($data);
print_r($miestas);
print_r($kaina);
print_r($result);
print_r($matches);
?>
1. To capture src attribute starting by http://i.ebayimg.com/ of all img tags :
regex : /src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i
Here is an example :
$re = "/src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i";
$str = "codeOfHTMLPage";
preg_match_all($re, $str, $matches);
Check it in live : here
If you want to be sure that you capture this url on an img tag then use this regex (keep in mind that performance will decrease if page is very long) :
$re = "/<img(?:.*?)src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i";
2. To capture href attribute starting by http://i.ebayimg.com/ of all a tags :
regex : /href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i
Here is an example :
$re = "/href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i;
$str = "codeOfHTMLPage";
preg_match_all($re, $str, $matches);
Check it in live : here
If you want to be sure that you capture this url on an a tag then use this regex (keep in mind that performance will decrease if page is very long) :
$re = "/<a(?:.*?)href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i";
More handy with DOMDocument:
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile($yourURL);
$imgNodes = $dom->getElementsByTagName('img');
$result = [];
foreach ($imgNodes as $imgNode) {
$src = $imgNode->getAttribute('src');
$urlElts = parse_url($src);
$ext = strtolower(array_pop(explode('.', $urlElts['path'])));
if ($ext == 'jpg' && $urlElts['host'] == 'i.ebayimg.com')
$result[] = $src;
}
print_r($result);
To get "normal" links, use the same way (DOMDocument + parse_url).
It might seem easy to do but I have trouble extracting this string. I have a string that has # tags in it and I'm trying to pull the tags maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa
And here is what I want to extract 33.536759,-7.613825,17z :
$var = preg_match_all("/#(\w*)/",$path,$query);
Any way I can do this? Much appreciated.
Change your regex to this one: /#([\w\d\.\,-]*)/.
This will return the string beginning with #.
$string = 'maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa';
$string = explode('/',$string);
//$coordinates = substr($string[3], 1);
//print_r($coordinates);
foreach ($string as $substring) {
if (substr( $substring, 0, 1 ) === "#") {
$coordinates = $substring;
}
}
echo $coordinates;
This is working for me:
$path = "maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa";
$var = preg_match_all("/#([^\/]+)/",$path,$query);
print $query[1][0];
A regex would do.
/#(-*\d+\.\d+),(-*\d\.\d+,\d+z*)/
If there is only one # and the string ends with / you can use the following code:
//String
$string = 'maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa';
//Save string after the first #
$coordinates = strstr($string, '#');
//Remove #
$coordinates = str_replace('#', '', $coordinates);
//Separate string on every /
$coordinates = explode('/', $coordinates );
//Save first part
$coordinates = $coordinates[0];
//Do what you want
echo $coordinates;
do like this
$re = '/#((.*?),-(.*?),)/mi';
$str = 'maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa';
preg_match_all($re, $str, $matches);
echo $matches[2][0].'<br>';
echo $matches[3][0];
output
33.536759
7.613825
how to get id url with preg_replace.
this is the link:
http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id how do I get the id? in the case would be: 5b87f8eaa7c20f79c3257eb3ec0a35e0
In this case I recommend not to use preg_match (preg_replace would be used to replace something.
Simply use
$array = explode('/',$_SERVER['REQUEST_URI']);
$id = $array[1];
If you must use preg_match:
$array = array();
preg_match('#^/photo/([0-9a-f]{32})/id$#',$_SERVER['REQUEST_URI'],$array);
$id = $array[1];
You can do this easily using strripos to find the last / in the URL.
$url = $_SERVER['REQUEST_URI'];
if (($pos = strripos($url, '/')) !== false) {
$id = substr($url, $pos + 1);
}
else {
trigger_error('You must supply a valid photo ID');
}
If you would like to just extract that id string, you can use:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
preg_match($pattern, $id_url, $output_array);
echo $output_array[1];
Or, to make the replacement:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
$replacement = "your replacement";
$replaced_url = preg_replace($pattern, $replacement, $id_url);
echo $replaced_url;
PHP Live Regex - a useful tool for testing your patterns
I have url like this
/cp/foo-bar/another-testing
how to parse it with the pattern
/cp/{0}-{1}/{2}
results will be
0:foo
1:bar
2:another-testing
I need a global solution to parse all kind of url with a pattern like that. I mean using {0}, {1} flag.
if (preg_match('#/cp/([^/]+?)-([^/]+?)/([^/]+)#'), $url, $matches)) {
//look into $matches[1], $matches[2] and $matches[3]
}
Instead of using {0}, {1}, {2}, I offer a new way: using {$s[0]}, {$s[1]}, {$s[2]}:
$your_url = '/cp/foo-bar/another-testing';
$s = explode('/', $your_url);
if(!$s[0])
array_shift($s);
if($temp = array_pop($s))
$s[] = $temp;
//then
$result = "/cp/{$s[0]}-{$s[1]}/{$s[2]}";