hi i need a bit help

Discussion:

(too old to reply)

2006-07-24 11:11:39 UTC

I would like to be able to read (parse) an html file into my Java
program. Once I'm able to do this, I need to be able to analyse the
html code.

If you could offer any help in meeting for first goal - parsing html
files - I would be very grateful. Even if its a link to somewhere, or
perhaps a book to read, that's fine too.

Many thanks,
vk

Andrew Thompson

2006-07-24 12:37:08 UTC

Permalink

Post by vk
I would like to be able to read (parse) an html file into my Java
program. Once I'm able to do this, I need to be able to analyse the
html code.

<sscce>
import javax.xml.parsers.*;
import org.w3c.dom.*;
import javax.swing.*;
import java.net.*;
import java.util.*;

public class ParseHTML extends JApplet {
JTree tree;

public void init() {
Vector v = new Vector();
URL index = getDocumentBase();
try {
Document doc = DocumentBuilderFactory.
newInstance().
newDocumentBuilder().
parse((index.toURI()).
toString());
tree = new JTree();
Element root = doc.getDocumentElement();
NodeList children = root.getChildNodes();
processElements( children, v );
} catch(Exception e) {
v.add(e.getMessage());
}
tree = new JTree(v);
for (int ii=0; ii< tree.getRowCount(); ii++) {
tree.expandRow(ii);
}
getContentPane().add( new JScrollPane(tree) );
}

public void processElements(
NodeList list,
Vector v) {

for (int ii=0; ii< list.getLength(); ii++) {
v.add( list.item(ii).toString() );
if ( list.item(ii) instanceof Element ) {
Element e = (Element)list.item(ii);
NodeList children = e.getChildNodes();
Vector v1 = new Vector();
v.add( v1 );
processElements( children, v1 );
}
}
}
}
</sscce>

<**html>
<!DOCTYPE HTML>
<HTML>
<HEAD>
<title>Parse HTML</title>
</HEAD>
<BODY>
<h1>Example of parsing (valid) HTML</h1>
<p>The applet in this web page loads the web page and attempts to
parse it into a org.w3c.dom.Document object.</p>
<p>The documents parsed must be well formed, which is
uncommon for most web pages.</p>
<APPLET
CODE="ParseHTML.class"
CODEBASE="."
WIDTH="600" HEIGHT="600">
</APPLET>
</BODY>
</HTML>
</**html>

HTH

Andrew T.

2006-07-24 18:57:29 UTC

Permalink

thanx a ton

o***@gmail.com

2015-09-30 08:16:46 UTC

Permalink

Post by Andrew Thompson

Post by vk
I would like to be able to read (parse) an html file into my Java
program. Once I'm able to do this, I need to be able to analyse the
html code.

<sscce>
import javax.xml.parsers.*;
import org.w3c.dom.*;
import javax.swing.*;
import java.net.*;
import java.util.*;
public class ParseHTML extends JApplet {
JTree tree;
public void init() {
Vector v = new Vector();
URL index = getDocumentBase();
try {
Document doc = DocumentBuilderFactory.
newInstance().
newDocumentBuilder().
parse((index.toURI()).
toString());
tree = new JTree();
Element root = doc.getDocumentElement();
NodeList children = root.getChildNodes();
processElements( children, v );
} catch(Exception e) {
v.add(e.getMessage());
}
tree = new JTree(v);
for (int ii=0; ii< tree.getRowCount(); ii++) {
tree.expandRow(ii);
}
getContentPane().add( new JScrollPane(tree) );
}
public void processElements(
NodeList list,
Vector v) {
for (int ii=0; ii< list.getLength(); ii++) {
v.add( list.item(ii).toString() );
if ( list.item(ii) instanceof Element ) {
Element e = (Element)list.item(ii);
NodeList children = e.getChildNodes();
Vector v1 = new Vector();
v.add( v1 );
processElements( children, v1 );
}
}
}
}
</sscce>
<**html>
<!DOCTYPE HTML>
<HTML>
<HEAD>
<title>Parse HTML</title>
</HEAD>
<BODY>
<h1>Example of parsing (valid) HTML</h1>
<p>The applet in this web page loads the web page and attempts to
parse it into a org.w3c.dom.Document object.</p>
<p>The documents parsed must be well formed, which is
uncommon for most web pages.</p>
<APPLET
CODE="ParseHTML.class"
CODEBASE="."
WIDTH="600" HEIGHT="600">
</APPLET>
</BODY>
</HTML>
</**html>
HTH
Andrew T.

I didn't end up using this because our (big and ugly) HTML was not well-formed enough and it was almost impossible to fix it to work with your suggestion, but this is the best - and ONLY - solution I found for this issue, and it is a rather brilliant one. Well done and thanks!

Ofer