How to explode a string without proper delimiter - PHP - php

I'm doing a custom page for my WP website and for this I'm getting the content of an existing WP page. Getting the content is not a problem but I'm getting it like this :
https://youtu.be/abcdefghijkhttps://youtu.be/kjihgfedcbahttps://youtu.be/abcdefghijk
I'd like to transform it like this :
https://youtu.be/abcdefghijk
https://youtu.be/kjihgfedcba
https://youtu.be/abcdefghijk
I tried explode('/', $content) but it's not working as I want.
I don't know if I can use substr() in this case.
How can I do to separate each url properly or atleast separate each video's id ?

You can get all video id(s) as an array like this:
$str = "https://youtu.be/abcdefghijkhttps://youtu.be/kjihgfedcbahttps://youtu.be/abcdefghijk";
$videos = explode("https://youtu.be/",$str);
If you print_r($videos); you will see all the id in an array.
If you want these array values as full URLs, then do like this:
$videos = array_map(function($value) { return ' '.$value; }, $videos);

Related

Trying to grab value from html page but getting template back not the value - php

I am making a price crawler for a project but am running into a bit of an issue. I am using the below code to extract values from an html page:
$content = file_get_contents($_POST['url']);
$resultsArray = array();
$sqlresult = array();
$priceElement = explode( '<div>value I want to extract</div>' , $content );
Now when I use this to get certain elements I only get back
Finance: {{value * value2}}
I want to get the actual value that would be displayed on the screen e.g
Finance: 7.96
The other php methods I have tried are:
curl
file_get_html(using simple_html_dom library)
None of these work either :( Any ideas what I can do?
You just set the <div>value I want to extract</div> as a delimiter, which means PHP looks for it to separate your string to array whenever this occurs.
In the following code we use , character as a delimiter:
<?php
$string = "apple,banana,lemon";
$array = explode(',', $string);
echo $array[1];
?>
The output should be this:
banana
In your example you set the value you want to extract as a delimiter. That's why this happens to you. You'll need to set a delimiter between your string you want to obtain and other string you won't need at the moment.
For example:
<?php
$string = "iDontNeedThis-dontExtractNow-value I want to extract-dontNeedEither";
$priceElement = explode('-', $string);
echo "<div>".$priceElement[2]."</div>";
?>
The code should output this to your HTML page:
<div>value I want to extract</div>
And it will appear on your page like this:
value I want to extract
If you don't need to save the whole array in a variable, you can save the one index of it to variable instead:
$priceElement = explode('-', $string)[2];
echo $priceElement;
This will save only value I want to extract so you won't have to deal with arrays later on.

Parsing closed brackets in URL and http_build_query with it inserts number in closed bracket

may not have explained this properly but here we go.
I have a URL that looks like http://www.test.co.uk/?page=2&area[]=thing&area[]=thing2
Multiple "area"s can be added or removed from the URL via links on the site. on each addition of n "area" I wanted to remove the "page" part of the URL. so it can be reset to page1. I used parse_url to take that bit out.
Then I built an http query so it could generate the URL properly without "page"
this resulted in "area%5B0%5D=" "area%5B1%5D=" instead of "area[]="
When I use urldecode, now it shows "area[0]=" and "area[1]="
I need it to be "[]" because when using a link to remove an area, it checks for the "[]=" - when it's [0] it doesn't recognise it. How do I keep it as "[]="?
See code below.
$currentURL = currentURL();
$parts = parse_url($currentURL);
parse_str($parts['query'], $query);
unset($query['page']);
$currenturlfinal = http_build_query($query);
urldecode($currenturlfinal);
$currentURL = "?" . urldecode($currenturlfinal);
This is what I've done so far - it fixes the visual part in the URL - however I don't think I've solved anything as I've realised that what represents 'area' and 'thing' is not recognised as $key or $val as a result of what I think is parsing or reencoding the url in accordance with the code below. So I still can't remove 'areas' using the links
$currentURL_with_QS2 = currentURL();
$parts = parse_url($currentURL_with_QS2);
parse_str($parts['query'], $query);
unset($query['page']);
$currenturlfinal = http_build_query($query);
$currenturlfinal = preg_replace('/%5B[0-9]+%5D/simU', '[]', $currenturlfinal);
urldecode($currenturlfinal);
$currentURL_with_QS = "?" . $currenturlfinal;
$numQueries = count(explode('&', $_SERVER['QUERY_STRING']));
$get = $_GET;
if (activeCat($val)) { // if this category is already set
$searchString = $key . '[]= ' . $val; // we build the query string to remove
I'm using Wordpress as well may I add - maybe there's a way to reset the pagination through Wordpress. of course even then - when I go to page 2 on any page it still changes the "[]" to "5b0%5d" etc....
EDIT: this is all part of a function that refers to $key (the area/category) and $val (name of area or category) which is echoed in the link itself
EDIT2: It works now!
I don't know why but I had to use the original code and make the adjustments I did before again and now it works exactly how I want it to! Yet I couldn't see any visible differences in both codes afterwards. Strange...
As far as I know, there is no built-in way to do this.
You could try with:
$currenturlfinal = http_build_query($query);
Where $query is querystring part w/o area parameters and then:
foreach ($areas as $area) {
$currenturlfinal .= '&area[]='.$area;
}
UPD:
you could try with:
$query = preg_replace('/%5B[0-9]+%5D/simU', '%5B%5D', $query);
just place it right after http_build_query call.

$GET query to directory?

i want to fetch youtube videos from the above script but the above code is getting keyword from GET parameter example.com/s=keyword and i want it to get from a example.com/HERE
i mean you can see there is a $_GET['s']
So this function works like this
example.com/s=keyword
and i want it to work like this
example/page/keyword
sorry for my bad english
$keyword = $_GET['s'];
file_get_contents("https://www.googleapis.com/youtube/v3/search?part=snippet&q=$keyword&type=video&key=abcdefg&maxResults=5");
Have a look at $_SERVER[REQUEST_URI]
This will return you the current url. Then process it using simple string or array functions to get the params, like
$current_url = $_SERVER['REQUEST_URI'];
$url_arr = explode("/", $current_url);
Then access the parameters using the array indexes
like $page = $url_arr[0];

url_title codeigniter showing underscores instead of hyphens

Hi I am trying to insert my article title in database in the following format this-is-a-new-title for the input This is a new Title. For this I have written :
$title = $this->input->post('topic_title');
$topic_slug_title = url_title($title,'-',TRUE);
But the echo $topic_slug_title shows titles like this_is_a_new_title. Why the underscores are added wherein I have given hyphens ?
Ok I found the solution. Just leave the second parameter as default,add the third parameter. That is:
$title = $this->input->post('topic_title');
$topic_slug_title = url_title($title,TRUE,TRUE);
don't use the third parameter, use it like this
$title = $this->input->post('topic_title');
$topic_slug_title = strtolower(url_title($title));
don't use second and third parameter, write like this:
**$title = $this->input->post('topic_title');
$topic_slug_title = url_title($title);**

Scraping complete web site for data within specific div tag where url includes string

I own a webshop and one of my suppliers is kind enough to give me a CSV file with product model numbers, price and title but they can't give me database dumps including their product descriptions. I am allowed to scrape the product descriptions though - the question is how?
All URLs include the model number like "title-of-product-MN-504-1.htm"
The descriptions are inside a <div> tag like "<div id="description"> Bla bla bla <other tag>bla bla </other tag> bla bla </div>"
Lets say I have all the model numbers in a csv file or MySQL table - how can I save the descriptions associated with the model number in the URL(also located within another div tag if that's easier)?
To sum up - input will be model numbers from a csv or MySQL table and the output should be a MySQL table(or csv) with the model numbers and the description from the div tag on individual pages.
I'm considering the following tools but I'm unsure how to connect them to do what I want: wget, cURL and PHP Simple HTML DOM Parser
You could use this http://phpcrawl.cuab.de/ and use this particular property: http://phpcrawl.cuab.de//classreferences/index.html, then to find the description : Extract string between html tags in php
As for your requirement of finding the modelnumber in URL's found on the crawled page you could use the following property: http://phpcrawl.cuab.de/classreferences/index.html
If you'd index the CSV file you got from them and index their site; I'd do the following
You build up a list of all the modelnumbers you need to get the description of.
Crawl their frontpage to start the process. gather URLs, add to visitlist
Visit every URL in your list that matches the modelnumber, get description, remove the model from the list. gather URLs, add to visitlist
Back to step 2 - repeat untill there's no more model on your list
As for how to get the URLs with the modelnumber in them: http://php.net/manual/en/function.strpos.php
Something like this, I leave the implementation up to you:
foreach($list_of_urls as $url) {
foreach($list_of_modelnumbers as $model) {
if(strpos($url, $model)) {
$list_of_urls_to_crawl[] = $url;
/* you can also remove the $model, but I already wrote it in a foreach loop */
break;
}
}
}
Then you can clear the $list_of_urls and append the new ones from the crawler results :)
foreach($list_of_urls_to_crawl as $url) {
//Set $crawler, let him go, get your description etc.
foreach($crawler->links_found as $url) {
$list_of_urls[] = $url;
}
}
And place it in a grand while($still_need_descriptions) loop.
Alternatively, if you don't like http://phpcrawl.cuab.de/, you could use PHP-Spider.
It would be as simple as writing a custom URL discoverer based on the CSV
and then parsing the crawled pages with XPath queries. See the example on https://mvdbos.github.io/php-spider/. The only thing you would need to change is the Discoverer class that is added to the Spider. Assuming you know how the URLs are built, it could look like this:
class CsvModelNumberDiscoverer implements Discoverer
{
protected $modelNumbersAndTitles = array();
public function __construct(array $modelNumbersAndTitles)
{
$this->modelNumbersAndTitles = $modelNumbersAndTitles;
}
public function discover(Spider $spider, Resource $document)
{
$urls = array();
foreach ($this->modelNumbersAndTitles as $number => $title) {
$urls[] = 'http://example.com/' . $title . '-MN-' . $number . '.htm';
}
return $urls;
}
}
The code where you run the spider would look like this:
$spider = new Spider('http://www.example.com');
$spider->addDiscoverer(new CsvModelNumberDiscoverer($modelNumbersAndTitles);
$result = $spider->crawl();
Finally, you could get the descriptions from the results like this:
foreach ($result['queued'] as $resource) {
$modelNo = $resource->getCrawler()->filterXpath("div[#id='modelNo']")->text();
$description = $resource->getCrawler()->filterXpath("div[#id='description']")->text();
}
If you don't know how the URLs are built, you would have spider the whole site (as in AmazingDreams' answer) and use the discoverer to match URLs to the list of model numbers. It take more time though.
Full disclosure: I wrote PHP-Spider.
You can first get the html code using
$homepage = file_get_contents('http://www.example.com/title-of-product-MN-504-1.htm');
Then you use the html code with the php dom parser, to get the value of the exact elements you need.

Categories