This is my first question on this site, sorry if it is not clear enough.
So my problem is that, i would like to get all of the product IDs from a webshop, that has no API.
A product id looks like: xy-000000
I know that I need a webscraper, but the problem is that i don’t know how to find a specific word like xy- 000000 with it. I tried many web-scrapers, but the only thing that i could find with them is html tags like the title or keywords.
I searched a lot for it on google, and found some web scrapers, bat they are not working fine for me.
As i mentioned, i would like to get all of the product IDs from a different webshop using php, for finding some products that i am not selling. (My webshop has the same product IDs as the other.)
Can anyone please help me find a php script that is similar to what i need?
So this is the code that i am trying to use:
<?php
$data = file_get_contents('https://www.mesemix.hu/hu/superman-ruhanemuk/11292-szuperhosoek-mintas-zokni.html');
error_reporting(0);
preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
$title = $matches[1];
preg_match('/[0-9]{6}/', $data, $matches);
$number = $matches[1];
preg_match('/<img[^>]*src=[\'"]([^\'"]+)[\'"][^>]*>/i', $data, $matches);
$img = $matches[1];
echo $title."<br>\n";
echo $img."<br>\n";
echo $number;
echo $data;
?>
The problem is that i can not find the 6 digit number with it. ($number)
In the webshop's source code it looks like this:
var productReference = 'SP- 418070';
If there is anything wrong with my question please let me know.
The Term you are looking for is "Web-Scraper"
You can do it in a couple of different ways.
One of these 2 PHP libraries
http://simplehtmldom.sourceforge.net/
Or
https://github.com/FriendsOfPHP/Goutte
Both are very simple to use there are documentations for both of them
The way they work are just like jQuery (javascript) you target the data that you need to get by the CSS selectors
Related
I'm trying to pull each of the viewer names from a JSON file in PHP.
I've looked around the Internet extensively for a working example that will offer me the result I desire without much success.
I'm really struggling to find an example that fits my needs on the Internet to help me with what is likely a very simple thing to accomplish.
I've got a JSON file that spits out several values on the Internet and I'm looking to extract every single line from one particular section.
Seeing a working example will likely help me understand what I am doing.
The JSON file that I am using for example is:
https://tmi.twitch.tv/group/user/dansgaming/chatters
I am trying to extract each single line from the "viewers" section in this file.
I've captured the data using the following PHP:
$testviewers = json_decode(#file_get_contents('https://tmi.twitch.tv/group/user/' . $streamName . '/chatters'), true);
var_dump($testviewers['chatters']['viewers']);
It turns out this isn't having the desired result for me.
I simply want each line in the viewer's section to be echoed out with page breaks.
What am I doing wrong? I've tried about two hundred different approaches to this one and have to admit this is my first real time working with JSON.
I've tried to search the Internet for answers and found many tutorials but none have made any sense to me and I know that seeing how to accomplish the result will help me learn exactly what should be going on.
In an ideal world, it will simply output each "viewer" on a separate line that I can work with. If I could echo each of them and then concatenate with a page break or the word "viewer:" before each one this would be a huge help and I'll be able to take it further and likely learn a great deal in the process.
this my way echo from json
$json = json_decode($response, true);
foreach($json['chatters'] as $key => $value)
{
if(!empty($value['viewers']))
{
$VIEWER = $value['viewers'];
$VIEWER = addslashes($VIEWER );
$VIEWER = trim(preg_replace('/\s\s+/', ' ', $VIEWER ));
}
else
{
$VIEWER = '';
}
echo 'VIEWER = '.$VIEWER .'</br>';
}
just make sure the foreach is true, maybe can help.
Turns out the issue here was a PHP error that wasn't displaying. The code was timing out because of how large the JSON file was and a low limit on my machine.
I think this is what you want....
$array = json_decode($your_json,true);
foreach($array['chatters']['viewers'] as $r) echo $r.'<br>';
As part of a form I collect a users bank details which includes their sort code. The sort code is stored in the database as six numbers, for example 771731.
I'd prefer to store the sort code in this format (without the hyphens) as it makes it easier for me to work with in other areas of the site. I want to output the sort code in one area of the site but in the format 77-17-31 using PHP.
I have searched Google for a solution but surprisingly found very little.
This is the prefect use-case for wordwrap(), just use it like this:
$code = "771731";
echo wordwrap($code, 2, "-", TRUE);
output:
77-17-31
Just figured this out using PHP functions, thought I'd post the answer in case it helped anyone else doing a Google search...
function formatSortCode($str) {
return implode("-", str_split($str, 2));
}
echo formatSortCode('771731'); // outputs 77-17-31
I arrived here looking for a JavaScript answer not realising it was a PHP question. Tweaking #MichaelLB's answer, the JavaScript translation is
function formatSortCode(str) {
return (String(str).match(/.{1,2}/g) || []).join('-');
}
formatSortCode(123456789);
>>> "12-34-56-78-9"
I am trying to extract urls from a large number of google search results. Getting them from the source code is proving to be quite challenging as the delimiters are not clear and not all of the urls are in the code. Is there a tool that can extract urls from a certain area of an image? If so that may be a better solution.
Any help would be much appreciated.
Try using the JSON/Atom Custom Search API instead: http://code.google.com/apis/customsearch/v1/overview.html. It gives you 100 api calls per day, something you can increase to 10000 per day, if you pay.
Use this excellent lib: http://simplehtmldom.sourceforge.net/manual.htm
// Grab the source code
$html = file_get_html('http://www.google.com/');
// Find all anchors, returns a array of element objects
$ret = $html->find('a');
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $ret->href;
EDit :
All "natural" search urls are in the #res div it seems.. With simplehtmldom find first #res, than all url inside of it. Don't remember exactly the syntax but it must be this way :
$ret = $html->find('div[id=res]')->find('a');
or maybe
$html->find('div[id=res] a');
I have a file that contains a bunch of links:
site 1
site 2
site 3
I want to get the URL to a link with specific text. For example, search for "site 2" and get back "http://site2.com"
I tried this:
preg_match("/.*?[Hh][Rr][Ee][Ff]=\"(.*?)\">site 2<\/[Aa]>.*/", $contents, $match)
(I know the HREF= will be the last part of the anchor)
But it returns
http://site1.com">site 1</a><a href="http://site2.com
Is there a way to do a search backwards, or something? I know I can do preg_match_all and loop over everything, but I'm trying to avoid that.
Try this:
preg_match("(<a.*?href=[\"']([^\"']+)[\"'][^>]?>site 2</a>)i",$contents,$match);
$result = $match[1];
Hope this helps!
Or you can try using phpQuery.
How can I, in PHP, get a summary of any URL? By summary, I mean something similar to the URL descriptions in Google web search results.
Is this possible? Is there already some kind of tool I can plug in to so I don't have to generate my own summaries?
I don't want to use metadata descriptions if possible.
-Dylan
What displays in Google is (generally) the META description tag. If you don't want to use that, you could use the page title instead though.
If you don't want to use metadata descriptions (btw, this is exactly what they are for), you have a lot of research and work to do. Essentially, you have to guess which part of the page is content and which is just navigation/fluff. Indeed, Google has exactly that; note however, that extracting valuable information from useless fluff is their #1 competency and they've been researching and improving that for a decade.
You can, of course, make an educated guess (e.g. "look for an element with ID or class maincontent" and get the first paragraph from it) and maybe it will be OK. The real question is, how good do you want the results to be? (Facebook has something similar for linking to websites, sometimes the summary just insists that an ad is the main content).
The following will allow you to to parse the contents of a page's title tag. Note: php must be configured to allow file_get_contents to retrieve URLs. Otherwise you'll have to use curl to retrieve the page HTML.
$title_open = '<title>';
$title_close = '</title>';
$page = file_get_contents( 'http://www.domain.com' );
$n = stripos( $page, $title_open ) + strlen( $title_open );
$m = stripos( $page, $title_close);
$title = substr( $page, n, m-n );
While i hate promoting a service i have found this:
embed.ly
It has an API, that returns a JSON with all the data you need.
But i am still searching for a free/opensource library to do the same thing.