10.Example program: list links

来源:百度文库 编辑:神马文学网 时间:2024/04/29 22:24:31

Example program: list links

This example program demonstrates how to fetch a page from a URL; extract links, images, and other pointers; and examine their URLs and text.

Specify the URL to fetch as the program's sole argument.

package org.jsoup.examples;import org.jsoup.Jsoup; 
import org.jsoup.helper.Validate;import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element;import org.jsoup.select.Elements; 
import java.io.IOException;/** * Example program to list links from a URL. */ 
public class ListLinks { 
    public static void main(String[] args) throws IOException { 
        Validate.isTrue(args.length == 1, "usage: supply url to fetch"); 
        String url = args[0];        print("Fetching %s...", url); 
        Document doc = Jsoup.connect(url).get(); 
        Elements links = doc.select("a[href]"); 
        Elements media = doc.select("[src]"); 
        Elements imports = doc.select("link[href]"); 
        print("\nMedia: (%d)", media.size());        for (Element src : media) { 
            if (src.tagName().equals("img")) 
                print(" * %s: <%s> %sx%s (%s)", 
                        src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"), 
                        trim(src.attr("alt"), 20));            else 
                print(" * %s: <%s>", src.tagName(), src.attr("abs:src")); 
        }        print("\nImports: (%d)", imports.size()); 
        for (Element link : imports) { 
            print(" * %s <%s> (%s)", link.tagName(),link.attr("abs:href"), link.attr("rel")); 
        }        print("\nLinks: (%d)", links.size()); 
        for (Element link : links) { 
            print(" * a: <%s>  (%s)", link.attr("abs:href"), trim(link.text(), 35)); 
        }    }    private static void print(String msg, Object... args) { 
        System.out.println(String.format(msg, args));    } 
    private static String trim(String s, int width) { 
        if (s.length() > width) 
            return s.substring(0, width-1) + ".";        else 
            return s;    }}  

Example output (trimmed)

Fetching http://news.ycombinator.com/...Media: (38)* img:  18x18 ()* img:  10x1 ()* img:  x ()* img:  0x10 ()* script: * img:  15x1 ()* img:  x ()* img:  25x1 ()* img:  x (Analytics by Mixpan.)Imports: (2)* link  (stylesheet)* link  (shortcut icon)Links: (141)* a:   ()* a:   (Hacker News)* a:   (new)* a:   (comments)* a:   (leaders)* a:   (jobs)* a:   (submit)* a:   (login)* a:   ()* a:   (Facebook speeds up PHP)* a:   (mcxx)* a:   (9 comments)* a:   ()* a:   ("Tough. Django produces XHTML.")* a:   (andybak)* a:   (3 comments)* a:   ()* a:   (More)* a:   (Lists)* a:   (RSS)* a:   (Bookmarklet)* a:   (Guidelines)* a:   (FAQ)* a:   (News News)* a:   (Feature Requests)* a:   (Y Combinator)* a:   (Apply)* a:   (Library)* a:   ()* a:   ()