8. Extract attributes, text, and HTML from elements

来源:百度文库 编辑:神马文学网 时间:2024/04/29 06:52:23

Extract attributes, text, and HTML from elements

Problem

After parsing a document, and finding some elements, you'll want to get at the data inside those elements.

Solution

  • To get the value of an attribute, use the Node.attr(String key) method
  • For the text on an element (and its combined children), use Element.text()
  • For HTML, use Element.html(), or Node.outerHtml() as appropriate

For example:

String html = "

An example link.

"; 
Document doc = Jsoup.parse(html);Element link = doc.select("a").first(); 
String text = doc.body().text(); // "An example link" 
String linkHref = link.attr("href"); // "http://example.com/" 
String linkText = link.text(); // "example"" 
String linkOuterH = link.outerHtml();  
    // "example
String linkInnerH = link.html(); // "example"

Description

The methods above are the core of the element data access methods. There are additional others:

  • Element.id()
  • Element.tagName()
  • Element.className() and Element.hasClass(String className)

All of these accessor methods have corresponding setter methods to change the data.

See also

  • The reference documentation for Element and the collection Elements class
  • Working with URLs
  • finding elements with the CSS selector syntax