i have find a php snippet that i modify for my need ,it use simple html dom for parse data from a webpage
here is a part of my code
$html = new simple_html_dom();
$html->load_file($page);
$items = $html->find('ul[class=history_list]');
foreach($items as $post) {
$items = $post->find('li[class=sub_pop_headline]');
$title=$items->plaintext;
foreach($items as $post) {
# remember comments count as nodes
$items_data = strip_tags($post);
echo $query = "INSERT INTO `mysqltable` ( `entry_id`,`domain_id`, `keyword`, `data`) VALUES ('',1, '$items_data', 'a:1:{s:11:\"{%KEYWORD%}\";s:11:\"$items_data\";}')";
$query_submit = mysqli_query($conn,$query);
the data are fetched (it work) but they are inserted with a lot of blank space into the sql table
here is what should look like the entry
columns keyword & data
my fetched title | a:1:{s:11:"{%KEYWORD%}";s:9:"my fetched title";}
but my code give this as output....
my fetched title| a:1:{s:11:"{%KEYWORD%}";s:9:"my fetched title";}
as you can see there is a lot of space so this is not correct
thanks you very much for help me , im not really a coder...
Sounds like all you need to do is trim() the title:
$title = trim($items->plaintext);
Related
I am trying to scrap http://spys.one/free-proxy-list/but here i just want get Proxy by ip:port column only
i checked the website there was 3 table
Anyone can help me out?
<?php
require "scrapper/simple_html_dom.php";
$html=file_get_html("http://spys.one/free-proxy-list/");
$html=new simple_html_dom($html);
$rows = array();
$table = $html->find('table',3);
var_dump($table);
Try the below script. It should fetch you only the required items and nothing else:
<?php
include 'simple_html_dom.php';
$url = "http://spys.one/free-proxy-list/";
$html = file_get_html($url);
foreach($html->find("table[width='65%'] tr[onmouseover]") as $file) {
$data = $file->find('td', 0)->plaintext;
echo $data . "<br/>";
}
?>
Output it produces like:
176.94.2.84
178.150.141.93
124.16.84.208
196.53.99.7
31.146.161.238
I really don 't know, what your simple html dom library does. Anyway. Nowadays PHP has all aboard what you need for parsing specific dom elements. Just use PHPs own DOMXPath class for querying dom elements.
Here 's a short example for getting the first column of a table.
$dom = new \DOMDocument();
$dom->loadHTML('https://your.url.goes.here');
$xpath = new \DomXPath($dom);
// query the first column with class "value" of the table with class "attributes"
$elements = $xpath->query('(/table[#class="attributes"]//td[#class="value"])[1]');
// iterate through all found td elements
foreach ($elements as $element) {
echo $element->nodeValue;
}
This is a possible example. It does not solve exactly your issue with http://spys.one/free-proxy-list/. But it shows you how you could easily get the first column of a specific table. The only thing you have to do now is finding the right query in the dom of the given site for the table you want to query. Because the dom of the given site is a pretty complex table layout from ages ago and the table you want to parse does not have a unique id or something else, you have to find out.
I have an api, which I send to a database and the results are returned in an xml format, this works fine and i can output the results to the screen with no problem. The xml feed is along list of property details. What i am trying to do is store the results in a mysql database, using the code below.
$feeds = array('http://web.demo.net/ademo_search.xml? &upw=123456');
foreach( $feeds as $feed ) {
$xml = simplexml_load_file($feed);
foreach($xml->channel->item as $item)
{
mysql_query("INSERT INTO property1 (id, department, branch, address1)
VALUES (
'',
'".mysql_real_escape_string($item->id)."',
'".mysql_real_escape_string($item->department)."',
'".mysql_real_escape_string($item->branch)."',
'".mysql_real_escape_string($item->address1)."')");
}
}
When I run this code I don't get an errors, nor does the data get added to the database.
here is a link to the xml structure, as you will see for my test i am only trying to insert the first few items.
This is exactly what you need :)
I hope you like it
$feed = "test.xml";
$xml = simplexml_load_file($feed);
$arr = array();
$i = 0;
foreach($xml->houses->property as $item){
$arr[$i][] = (string) $item->id;
$arr[$i][] = (string) $item->department;
$arr[$i][] = (string) $item->branch;
$arr[$i][] = (string) $item->address->address1;
$i++;
}
var_dump($arr); // now you have them in an array .. store them in db ;)
first: in the feeds array, there are a few spaces, this might cause a problem parsing the url and return 404 error or something (not sure about that, but check it)
second: in the query, remove the '' because you have 4 columns with 5 variables passed to them
third: you might want to see MySQL error by adding or die(mysql_error()) after the query before the semicolon
fourth: consider replacing mysql with mysqli or PDO
I am trying to get JSON response using PHP. I want to have Json array not the HTML tags. But the output shows HTML tags as well.I want to remove this HTML output! PHP code is as follows: I don't know how to do this ? Please help.
Thanks in advance :)
<?php
function getFixture(){
$db = new DbConnect();
// array for json response of full fixture
$response = array();
$response["fixture"] = array();
$result = mysql_query("SELECT * FROM fixture"); // Select all rows from fixture table
while($row = mysql_fetch_array($result)){
$tmp = array(); // temporary array to create single match information
$tmp["matchId"] = $row["matchId"];
$tmp["teamA"] = $row["teamA"];
$tmp["teamB"] = $row["teamB"];
array_push($response["fixture"], $tmp);
}
header('Content-Type: application/json');
echo json_encode($response);
}
getFixture();
?>
It's difficult to tell without seeing what your output is, but there is nothing in your code which would add HTML to your response.
It sounds like the HTML is in the database, so you're getting the data as expected, and your browser is the displaying whatever html elements might be there.
You could ensure none of the rows from the database have HTML in them by using strip_tags as follows:
$tmp["teamA"] = strip_tags($row["teamA"]);
Do this for all rows which may contain html.
Sorry if this is not formatted right, I'm new to StackOverflow!
http://php.net/strip-tags
I have an XML file. I want to save all the data from the XML file to the database
The file structure of XML is like
<STORY>
<BYLINE>abc</BYLINE>
<STORYID>123456</STORYID>
</STORY>
The code for storing data to database that I am using is
$dom = new DOMDOcument();
$dom->loadXML(equitymarketnews/$zname);
$xpath = new DOMXpath($dom);
$res = $xpath->query("//STORY/");
$allres = array();
foreach($res as $node){
$result = array();
$byline = mysql_real_escape_string($node->getElementsByTagName("BYLINE")->item(0)->nodeValue);
$storyid = mysql_real_escape_string($node->getElementsByTagName("STORYID")->item(0)->nodeValue);
}
$sql12="insert into equitymarketnews values('$byline','$storyid')";
mysql_query($sql12);
I am getting nothing in my database. All values are blanks.
Where am I going wrong?
I think something is wrong with this line
$res = $xpath->query("//STORY/");
i want to story the data ie ABC and 12345 FROm XML File To Table in database
I don't really know what your question is but assuming that the code you posted does not work as you aspect, one thing i noticed is the insertion of the record:
$sql12="insert into equitymarketnews values('$byline','$storyid','$pubdate','$author','$cat','$subcat','$titleline','$subtitleline,'$storymain','$flag')";
mysql_query($sql12);
shouldn't it be inside your foreach loop? Otherwise you won't get anything into your database.
In my opinion it should look something like that:
foreach($res as $node){
$result = array();
$byline = mysql_real_escape_string($node->getElementsByTagName("BYLINE")->item(0)->nodeValue);
$storyid = mysql_real_escape_string($node->getElementsByTagName("STORYID")->item(0)->nodeValue);
$sql12="insert into equitymarketnews values('$byline','$storyid')";
mysql_query($sql12);
}
You can actually use mysql client directly for importing XML data. I do not have much experience to provide you with a code sample, but MySQL docs have quite a bit documentation on it.
Essentially, you can do something like this:
LOAD XML LOCAL INFILE 'address.xml' INTO TABLE quitymarketnews ROWS IDENTIFIED BY '<STORY>';
I am attempting to scrape the web page (see code) - as well as those pages going back in time (you can see the date '20110509' in the page itself) - for simple numerical strings. I can't seem to figure out through much trial and error (I'm new to programming) how to parse the specific data in the table that I want. I have been trying to use simple PHP/HTML without curl or other such things. Is this possible? I think my main issue is
using the delimiters that are necessary to get the data from the source code.
What I'd like is for the program to start at the very first page it can, say for example '20050101', and scan through each page till the current date, grabbing the specific data for example, the "latest close" (column), "closing arm" (row), and have that value for the corresponding date exported to a single .txt file, with the date being separated from the value with a comma. Each time the program is run, the date/value should be appended to the existing text file.
I am aware many lines of the code below are junk, it's part of my learning process.
<html>
<title>HTML with PHP</title>
<body>
<?php
$rawdata = file_get_contents('http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2-20110509.html?mod=mdc_pastcalendar');
//$data = substr(' ', $data);
//$begindate = '20050101';
//$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
//if (preg_match(' <td class="text"> ' , $data , $content)) {
//$content = str_replace($newlines
echo $rawdata;
///file_put_contents( 'NYSETRIN.html' , $content , FILE_APPEND);
?>
<b>some more html</b>
<?php
?>
</body>
</html>
All right so let's do this. We're going to first load the data into an HTML parser, then create an XPath parser out of it. XPath will help us navigate around the HTML easily. So:
$date = "20110509";
$data = file_get_contents("http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2-{$date}.html?mod=mdc_pastcalendar");
$doc = new DOMDocument();
#$doc->loadHTML($data);
$xpath = new DOMXpath($doc);
Now then we need to grab some data. First off let's get all the data tables. Looking at the source, these tables are indicated by a class of mdcTable:
$result = $xpath->query("//table[#class='mdcTable']");
echo "Tables found: {$result->length}\n";
So far:
$ php test.php
Tables found: 5
Okay so we have the tables. Now we need to get specific column. So let's use the latest close column you mentioned:
$result = $xpath->query("//table[#class='mdcTable']/*/td[contains(.,'Latest close')]");
foreach($result as $td) {
echo "Column contains: {$td->nodeValue}\n";
}
The result so far:
$ php test.php
Column contains: Latest close
Column contains: Latest close
Column contains: Latest close
... etc ...
Now we need the column index for getting the specific column for the specific row. We do this by counting all of the previous sibling elements, then adding one. This is because element index selectors are 1 indexed, not 0 indexed:
$result = $xpath->query("//table[#class='mdcTable']/*/td[contains(.,'Latest close')]");
$column_position = count($xpath->query('preceding::*', $result->item(0))) + 1;
echo "Position is: $column_position\n";
Result is:
$ php test.php
Position is: 2
Now we need to get our specific row:
$data_row = $xpath->query("//table[#class='mdcTable']/*/td[starts-with(.,'Closing Arms')]");
echo "Returned {$data_row->length} row(s)\n";
Here we use starts-with, since the row label has a utf-8 symbol in it. This makes it easier. Result so far:
$ php test.php
Returned 4 row(s)
Now we need to use the column index to get the data we want:
$data_row = $xpath->query("//table[#class='mdcTable']/*/td[starts-with(.,'Closing Arms')]/../*[$column_position]");
foreach($data_row as $row) {
echo "{$date},{$row->nodeValue}\n";
}
Result is:
$ php test.php
20110509,1.26
20110509,1.40
20110509,0.32
20110509,1.01
Which can now be written to a file. Now, we don't have the markets these apply to, so let's go ahead and grab those:
$headings = array();
$market_headings = $xpath->query("//table[#class='mdcTable']/*/td[#class='colhead'][1]");
foreach($market_headings as $market_heading) {
$headings[] = $market_heading->nodeValue;
}
Now we can use a counter to reference which market we're on:
$data_row = $xpath->query("//table[#class='mdcTable']/*/td[starts-with(.,'Closing Arms')]/../*[$column_position]");
$i = 0;
foreach($data_row as $row) {
echo "{$date},{$headings[$i]},{$row->nodeValue}\n";
$i++;
}
The output being:
$ php test.php
20110509,NYSE,1.26
20110509,Nasdaq,1.40
20110509,NYSE Amex,0.32
20110509,NYSE Arca,1.01
Now for your part:
This can be made into a function that takes a date
You'll need code to write out the file. Check out the filesystem functions for hints
This can be made extendible to use different columns and different rows
I'd recommend using the HTML Agility Pack, its a HTML parser which is very handy for finding particular content within a HTML document.