Chapter 2: An Overview of WebCore

来源:百度文库 编辑:神马文学网 时间:2024/04/30 07:00:29

An Overview of WebCore

WebCore can be divided into two principal areas: KWQ and KHTML.

KWQ - KWQ (pronounced "quack") is an adapter layer used to communicate with KHTML. It is written in Objective C++. The KHTML engine used in Konqueror was written on top of a cross-platform toolkit called Qt. KWQ is essentially an implementation of the subset of Qt required to make KHTML work on OS X. KWQ files can be found under the "kwq" subdirectory.

KHTML - KHTML is the layout engine and contains all of the code for constructing and rendering HTML and XML. This code is written entirely in C++, with the exception of some glue code that communicates with WebKit (using Objective C++). In general, code inside the "khtml" subdirectory is designed to be given back to KDE unless that code is contained in a .mm file (signifying Objective C++ code) or is contained inside #if APPLE_CHANGES.

When you see APPLE_CHANGES in the code, then we have deviated from KHTML to solve some problem unique to OS X, and it makes no sense for those changes to be used when building KHTML on other platforms.

KHTML

The "khtml" directory contains only seven subdirectories. They are as follows:

  • css - The CSS subdirectory contains the implementation of CSS parsing, the CSS object model, and the code for handling the computation of style on elements. It contains data structure implementations for sheets, rules, declarations, and selectors, as well as for CSS unit types.
  • dom - The DOM subdirectory contains the C++ language bindings for the DOM, both HTML and XML. These language bindings are not used internally by the layout engine, but are instead designed to be used by JavaScript only. Each interface in the DOM specification is represented by a corresponding C++ class that wraps the real implementation of that DOM element. Basically each implementation in this directory serves as a pass through from JS to the real element implementation.
  • ecma - The ecma subdirectory contains the JS glue code that connects JS and the DOM. Whenever a script tries to invoke a method or get/set a property on a JS object that represents a DOM object, the glue code for that object is invoked. The glue code then passes the request through to the DOM code (in the dom subdirectory), which in turn passes the request through to the real implementation.

    [Script accesses property] -> [JS Object] -> [Ecma Glue Code] -> [DOM Object] -> [DOM Object Implementation]
  • misc - Contains some miscellaneous helper classes used by the rest of the code. The most important object found here is the implementation of KHTML's memory cache in loader.h/.cpp. This cache is designed to hold scripts, stylesheets, and images. Scripts and stylesheets are stored in raw string form and so must still be compiled/parsed even when pulled out of this cache. Images are stored in a decoded form, so that they need not be re-decoded if they are contained in the cache.

    The cache itself uses the LRU-SP (size-adjusted and probability-aware LRU) algorithm for storing objects in memory and scales dynamically based off the amount of memory available on the system.
  • html - Contains the implementation of the HTML DOM, as well as the HTML parser and the HTML tokenizer. The raw HTML data is sent to the parser, which constructs a tree of HTML DOM implementation objects.
  • xml - Contains the implementation of the DOM Core. The HTML implementation objects derive from base class implementations contained in this directory. In addition XML document tree construction is handled here. Parsing is handled over in KWQ using expat, but listeners in this subdirectory build up the tree in response to callbacks from the expat parser.
  • rendering - The implementation of the rendering model for KHTML. Classes in this subdirectory represent rectangles that you actually see on screen (or when printing). They correspond to objects in the DOM and are constructed based off the computed style information for DOM objects.

Example: "Hello World" HTML Document

Consider the following HTML document:

Hello world.

Let's follow the construction of this document from parsing through to the final rendering on screen. From WebCore's perspective, the action starts in an object called the KHTMLPart. The KHTMLPart is a container object that houses the current document, its JS objects, its render tree, and its stylesheets.

The begin method creates the appropriate document (HTML vs. XML) based off the MIME type sent by the Web server.

The document creates an object called an HTMLTokenizer whose implementation can be found in khtml/html/htmltokenizer.cpp. The tokenizer's job is to scan the bytes sent from the network and construct tokens out of the source. For example it figures out when you have an open tag or a close tag, or what kinds of attributes go with a specific tag.

The tokenizer in turn creates a KHTMLParser. The implementation of the parser can be found in khtml/html/htmlparser.cpp. The parser's job is to take the tokens it receives from the HTML tokenizer and build DOM nodes. The entire construction of the DOM tree for an HTML document happens from within the parser.

When the parser encounters stylesheets such as the one contained in the example above, it creates an instance of a CSS parser to build up a CSS object model for the rules, declarations and selectors contained in the sheet. The implementation of the CSS parser can be found in khtml/css/cssparser.cpp.

As sheets complete construction, they are added to the document. The parsing of a sheet does not delay the construction of the DOM tree by the parser, unless the sheet is contained inline inside the HTML (as in the example above).

As soon as style information becomes available (i.e., all stylesheets have loaded and been attached to the document), style information will be computed for each element. This method, called styleForElement, can be found in khtml/css/cssstyleselector.cpp. This method examines all of the stylesheets in the document and it constructs a RenderStyle object that contains all of the computed style information for a given element in terms that are easily accessible to the render tree.

Once the style information has been computed, if a corresponding rendering object should be constructed, then that render object is built and the style object is attached to the render object (and owned by the render object).

The process of building RenderObjects and RenderStyles for elements is called attachment. Similarly the destruction of said objects is called detachment. All DOM implementation objects have an attach() method that is called recursively to build up the render tree, and a detach method that is called to destroy the render tree.

Once the render tree has been built, it must receive a layout. RenderObjects implement a layout method that is used to determine the position and size of the RenderObject boxes on screen. Layout is a recursive function, so calling layout on the root RenderObject results in the entire rendering tree being flowed.

Finally once all of the rendering objects have been placed at the correct positions, they are painted. All RenderObjects implement paint methods. Painting is also recursive.

So to summarize, the steps involved in building up a document are:

  • HTML Tokenization
  • HTML Parsing
  • Stylesheet Parsing/Construction
  • Document Tree Construction
  • RenderStyle and RenderObject Construction
  • RenderObject Layout
  • RenderObject Painting
Each of these steps will be covered in greater detail in subsequent documents.