scraping a web page for all headings and contents in php - php

I have looked around the web on how to scrape all headings (h1 to h6) with content. Like this <h2>Some Heading</h2>, <h4>Some Heading</h4>. I have even looked at file_get_html() which PHP does not recognize. The code I have written so far lets you see the content but with out the h1 tags. I am new to this so if anyone can help me I would appreciate it. Here is my code I have now:
<html>
<head>
<title></title>
</head>
<body>
<?php
$theurl = "http://www.msn.com";
if(!($contents=file_get_contents($theurl)))
{
echo 'Could not open URL';
exit;
}else{
echo "The $theurl is open <br />";
}
$pattern = "/<h[1-6]>(.*?)<\/h[1-6]>/si";
$found = preg_match_all($pattern,$contents,$matches);
if(is_array($matches) && count($matches) >= 1){
echo "Scraping $theurl<br />";
for($i = 1; $i <= $found - 1; $i++){
echo $matches[0][$i];
}
}else{
echo "No heading found";
}
?>
</body>
</html>

Related

Why output control doesn't catch sql?

I wrote a code which is changing headings h1 to h2 if it's not first heading of that type. It works, but only when I'm using just PHP. If I want to use it on website, which one is using database to output content, that code doesn't work.
I'm using ob_start() before <html> and ob_get_contents() + ob_end_clean() + rest of my code after </html>, so I think that has to be something wrong "catching" content from database, but I'm not sure. I tried to use that code on a website based on WordPress.
My code (I know it isn't probably the best solution, but it works when I'm using it on any website without CMS):
<!DOCTYPE html>
<?php
ob_start();
?>
<html>
<head>
<title>random page</title>
</head>
<body>
<div class="container">
<div class="row">
<div class="col-md-5">
<h1>afdsfdassfdfdsa</h1>
</div>
</div>
</div>
<h1>sadffsadf afdsfdsa afsdfs?</h1>
<h1 class="superclass">random text?</h1>
<div>
<p>some ending content</p>
</div>
</body>
</html>
<?php
$bufferContent = ob_get_contents();
ob_end_clean();
$matchedContent = '';
$endContent = '';
$modifiedContent = '';
$firstHeadingBoolean = true;
$h1Pattern = array();
$h1Pattern[0] = '%<h1(.*?)>%';
$h1Pattern[1] = '%</h1>%';
$h1Replacement = array();
$h1Replacement[0] = '<h2$1>';
$h1Replacement[1] = '</h2>';
if(preg_match_all('%((.|\n)*?)(<h1.*?>.*?</h1>)%', $bufferContent, $contentMatches)){
foreach($contentMatches[0] as $matches) {
$matchedContent .= $matches;
}
$endContent = str_replace($matchedContent, '', $bufferContent);
foreach ($contentMatches[0] as $matches) {
if(!$firstHeadingBoolean){
$firstHeadingBoolean = false;
} else {
$matches = preg_replace($h1Pattern, $h1Replacement, $matches);
}
$modifiedContent .= $matches;
}
echo $modifiedContent;
echo $endContent;
} else {
echo $bufferContent;
}
?>
EDIT: I tried to use solutions from there but nothing has changed: WordPress filter to modify final html output
Now, after some testing, I can see it's not working because preg_match_all doesn't work correctly. Anyone has an idea what is wrong with that preg_match_all? I tested that regex pattern inside that in regex101 and on my localhost and everything worked fine. I don't understand, why isn't it working here?
Ok, so that wasn't problem with sql but with regex. To solve that problem I used code like that:
<?php
ob_start();
?>
<!-- html code -->
<?php
$bufferContent = ob_get_contents();
ob_end_clean();
$bufferContent = preg_split("%(?=<h1)%",$bufferContent);
$i = 0;
$h1Pattern = array();
$h1Pattern[0] = '%<h1(.*?)>%';
$h1Pattern[1] = '%</h1>%';
$h1Replacement = array();
$h1Replacement[0] = '<h2$1>';
$h1Replacement[1] = '</h2>';
foreach($bufferContent as $bufferElement){
if(preg_match("%<h1%", $bufferElement)){
if($i > 0){
echo preg_replace($h1Pattern, $h1Replacement, $bufferElement);
} else {
echo $bufferElement;
$i++;
}
} else {
echo $bufferElement;
}
}
?>

How to validate php if form is filled

//So i have a php validation external and a html file. I'm trying to validate if //the input boxes are filled out correctly... so far i have one but I can't get it //to run and i tried testing it out doesn't work... do i need to download //something or is my code completely wrong. I just trying to check if its empty and if it has at least 3 characters
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Cal5t</title>
</head>
<body>
<?php
$title= $_REQUEST["title"];
if ($title == "" or >3 ) {
echo "<p>5please!</p>";
?>
</body>
</html>
You are probably looking for something like:
if (($title == "") or (strlen($title) < 5))
{
echo "<p>5please!</p>";
}
if(!empty($title= $_POST['title']) && strlen($title) > 5){
echo "valid: $title";
} else {
echo "incorrect title: '$title'";
}
Also, its beter to use $_POST or $_GET over $_REQUEST.
I think you're looking for:
if(strlen($title) < 5){
echo '<p>The title you entered is not long enough.</p>";
}
You can make sure there is a value with the isset() function.
requird validation
$title = trim($_POST['title']); // it will remove stat and end spaces
if(strlen($title) <= 0)
{
echo "Title Field is required";
}

Cannot add php code to a variable?

I am trying to create a PHP file each time a user registers my website. I use the following code to create the file in my register.php :
The thing is, my create file function works but the variable $data doesn't give any result. When I run that $data as a single variable in a different PHP file it still doesn't work.
What did I do wrong about setting the variable.
// STARTING to create a file
$my_file = "$username.php";
$handle = fopen("give/$my_file", 'w') or die('Cannot open file: '.$my_file);
//----------- BEGINNING OF THE PHP DATA TO WRITE TO NEW FILE ----------
$data = "<?
require('../config.inc.php');
$damned_user = $username;
if ( $_COOKIE['damn_given'] != TRUE ) {
$sql = mysql_query(\"SELECT * FROM users WHERE username='$damned_user' LIMIT 1\");
if(mysql_num_rows($sql) == 1){
$row = mysql_fetch_array($sql);
// $row['field'];
$damned_user_id = $row['id'];
if($_SESSION['id'] == $damned_user_id) {
} else {
$taken = $row['taken_damns'];
$taken_damns = $taken + 1;
$taking_sql = \"UPDATE users SET taken_damns='$taken_damns' WHERE username='$damned_user' \";
if (mysql_query($taking_sql)) {
setcookie(\"damn_given\", TRUE, time()+3600*24);
$date = date(\"Y-m-d H:i:s\");
$ip = $_SERVER['REMOTE_ADDR'];
$damns_table = \"INSERT INTO damns (id, from_ip, user_damned, when_damned) VALUES ('','$ip','$damned_user','$date') \";
if ( mysql_query($damns_table)) {
} else {
echo \"Couldn't save damn to damns table in database!\";
}
if ( $_SESSION['logged'] == TRUE ) {
$session_id = $_SESSION['id'];
$giving_sql = \"UPDATE users SET given_damns='$taken_damns' WHERE id='$session_id'\";
if ( mysql_query($giving_sql ) ) {
} else {
echo ('Error giving damn!');
}
}
}
else
{
die (\"Error taking damn!\");
}
}
} else {
die(\"Error first sql!\");
}
}
?>
<html>
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
<link rel=\"stylesheet\" href=\"/_common.css\" />
<link rel=\"stylesheet\" href=\"/_col_white.css\" />
<link rel=\"shortcut icon\" href=\"/favicon.ico\" /> <title>DamnIt.tk - Damned!</title>
</head>
<body>
<div id=\"main\">
<div class=\"center\"><img src=\"/_bnr_white.png\" style=\"width: 500px; height: 100px;\" alt=\"DamnIt Banner\" /></div>
<table class=\"tmid\" style=\"width: 100%;\"><tr>
<td class=\"center\" style=\"width: 25%;\">Profile</td>
<td class=\"center\" style=\"width: 25%;\">Options</td>
<td class=\"center\" style=\"width: 25%;\">Stats</td>
<td class=\"center\" style=\"width: 25%;\">Log out</td>
</tr></table> <h1>Give a Damn</h1>
<?
if ( isset($_COOKIE['damn_given'])) {
?>
<h2>You have already given a Damn to <? echo $damned_user ?> today!</h2><h3>Couldn't damn - try again tomorrow.</h3>
<?
}
elseif ( $_SESSION['id'] == $damned_user_id ) {
?>
<h2>You cannot damn yourself!</h2>
<?
} else{ ?> <h2>Damn given!</h2><h3>You have given a Damn to <? echo $damned_user ?>.</h3> <? } ?>
</div></body>
</html>";
//------- END OF PHP WHICH MUST BE WRITTEN TO NEW FILE ---------
fwrite($handle, $data);
fclose($handle);
// finished with the file
Try NOWDOC:
$data = <<<'END'
Your PHP code here
END;
This will allow for any string, without need for escaping.
However, please consider what you're doing very carefully!
Also... you wouldn't happen to be trying to rip off this site of mine, would you? http://giveadamn.co.uk/
(source: giveadamn.co.uk)
Because if so, you're doing it wrong. .htaccess, mate ;)
RewriteEngine on
RewriteRule give/(.*) give.php?user=$1 [L]

My pagination isn't working properly. Help!

<?php
// Code not directly relevant omitted (including lots of vars)
//Create pagination links
$first = "First";
$prev = "Prev";
$next = "Next";
$last = "Last";
if($current_page>1)
{
$prevPage = $current_page - 1;
$first = "First";
$prev = "Prev";
}
if($current_page<$total_pages)
{
$nextPage = $current_page + 1;
$next = "Next";
$last = "Last";
}
?>
<html>
<title></title>
<body>
<h2>Here are the records for page <?php echo $current_page; ?></h2>
<ul>
<?php echo $slots; ?>
</ul>
Page <?php echo $current_page; ?> of <?php echo $total_pages; ?>
<br />
<?php echo "{$first} | {$prev} | {$next} | {$last}"; ?>
</body>
</html>
EDIT/UPDATE:
I just realized I had a file called test.php a while back. I deleted it but I guess it's still in my site somehow... Nevertheless, I changed the word test with works. Now when I click next, it brings me to mydomain.com/works.php?page=2. But it shows a 404 error :/
Can somebody please tell me where I screwed up? Thanks!
The code certainly looks good; are you sure the problem is in the links being generated? Maybe you are being redirected from the "correct" page for some other reason?

Access a PHP array from JavaScript

I have the following code where I declare a PHP array variable and inside a function, I put some data into the array. I also display buttons mapped to each index of the array that will show the data in the PHP array for that index number.
When testing on a browser, I don't get the right answer. I checked the page source, it had code like data_array = ["<?php echo implode ('',Array); ?>"]; instead of the text from the Array.
What am I doing wrong and what should I do to get the correct output? (BTW, I tried to execute the same without declaring the function and it seemed to work, but I need a function for my work and can't take that approach).
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
<html lang="en">
<head>
<title>Example</title>
<?php
$giant_says = array();
function display() {
global $giant_says;
$giant_says[] = "<a href='http://www.google.com'>Google</a>";
$giant_says[] = "Yahoo!";
$giant_says[] = "Bing";
echo "<div id='content'>";
echo $giant_says[0];
echo "</div><br><br>";
$i = 0;
while($i < count($giant_says)) {
echo "<input type='button' value='".$i."' onClick=\"addtext(".$i.");return false;\"";
$i += 1;
}
}
?>
<script type="text/javascript">
function addtext(index) {
giantSays = ["<?php echo implode ('","', $giant_says); ?>"];
document.getElementById('content').innerHTML = giantSays[index];
}
</script>
</head>
<body>
<?php
display();
?>
</body>
</html>
You have the order wrong, which is causing the implode() to compress an empty array. I also suggest using json_encode() instead of implode(). It exists for this type of thing - updated example below:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
<html lang="en">
<head>
<title>Example</title>
<?php
$giant_says = array();
function display(&$giant_says) {
// Calculate the array (referenced)
$giant_says[] = "<a href='http://www.google.com'>Google</a>";
$giant_says[] = "Yahoo!";
$giant_says[] = "Bing";
// Return the HTML, to display later
ob_start();
echo "<div id='content'>";
echo $giant_says[0];
echo "</div><br><br>";
$i = 0;
while($i < count($giant_says)) {
echo "<input type='button' value='".$i."' onClick=\"addtext(".$i.");return false;\">";
$i += 1;
}
$Return = ob_get_contents();
ob_end_clean();
return $Return;
}
$Display = display($giant_says);
?>
<script type="text/javascript">
function addtext(index) {
giantSays = <?php echo json_encode($giant_says); ?>;
document.getElementById('content').innerHTML = giantSays[index];
}
</script>
</head>
<body>
<?php
echo $Display;
?>
</body>
</html>
You're trying to implode the $giant_says array before you've filled it (you're calling display() after the implode when the call needs to happen before).
The problem is that you call the display method, that fills the content after the html part with the javascript is sended.
the html code is "like" making an "echo 'html'" from your php. Your html is already processed but the display method is not called. call the method before the html code.
Example:
<?php
$giant_says = array();
$giant_says[] = "<a href='http://www.google.com'>Google</a>";
$giant_says[] = "Yahoo!";
$giant_says[] = "Bing";
function display() {
global $giant_says;
echo '<div id="content">'.$giant_says[0]."</div><br><br>";
$i = 0;
while($i < count($giant_says)) {
echo "<input type='button' value='".$i."' onClick=\"addtext(".$i.");\" />";
$i += 1;
}
}
?>
<html>
<head>
<title>Example</title>
<script type="text/javascript">
function addtext(index) {
giantSays = ["<?php echo implode ('","', $giant_says); ?>"];
document.getElementById('content').innerHTML = giantSays[index];
return false;
}
</script>
</head>
<body>
<?php display(); ?>
</body>
</html>

Categories