Parsing XML. A Basic XML Document. Differences Between XML and HTML. Common Mistakes. White Space. Closing Tags. Nesting Tags. Root Element. read the entire document at lesforgesdessalles.info on the W3C Web .. There are three basic ways to tell a browser (specifically, Microsoft Internet Ex-. Before you continue you should have a basic understanding of the following: HTML is about displaying information, while XML is about carrying information.
|Language:||English, Spanish, Japanese|
|Genre:||Business & Career|
|ePub File Size:||23.37 MB|
|PDF File Size:||10.20 MB|
|Distribution:||Free* [*Regsitration Required]|
xml version="" encoding="UTF-8"?> Belgian Waffles $ Two of our famous . This tutorial will teach you the basics of XML. The tutorial is divided into sections such as. XML Basics, Advanced XML, and XML tools. Each of these sections. Basic XML Concepts. 3. „XML is the cure for your data exchange, information pdf">.
Note that with mixed content like this, you have no control over the number or order of the elements that are used. Were the value simply letter , the template would match letter elements throughout the document. As a result, whitespace is collapsed and our whole document appears on one line. In fact, the rest of this chapter will be devoted to tackling this issue. Stream-oriented facilities require less memory and, for certain tasks based on a linear traversal of an XML document, are faster and simpler than other alternatives. In DTD-speak, this means that the attribute is optional.
This line refers to the DTD or schema your list of elements and rules to be used to validate that document. This example assumes that your element list file is named filename. Entities can be phrases of text or special characters. They can point internally or externally. Entities must be declared and expressed properly to avoid errors and to ensure proper display.
You cannot typed special characters directly into your content. To use a symbol in your text, you must set it up as an entity using its character code. You can set up phrases such as a company name as an entity, then type the entity throughout your content. This code identifies the text that stands in for the entity. Using entities might help you avoid typing the same phrase or information repeatedly.
It can also make it easier to adjust the text—perhaps if the company name changes—in many places with a simple adjustment in the entity definition. If it displays your elements, attributes, and content, then the XML is well formed. If instead errors are displayed, you likely have a syntax error and need to review your document carefully for typos or missing tags and punctuation.
As mentioned in Nest the elements , an element that contains another element is the parent of that contained element. Remember to nest your sibling elements properly, as well. Listing 7 shows well-formed and properly nested XML. The line breaks make it easier for you to read your code and do not affect the XML. You might wish to experiment with your test files, and move the end tags and beginning tags, to become familiar with the resulting error messages. In Figure 1 , your elements show up clearly when viewed within Internet Explorer.
Beginning and end tags surround your content. View image at full size. Beyond a few simple rules, you have flexibility in designing your XML elements and attributes. Typing an XML document is also not difficult.
What is difficult is figuring out what you need from your documents in terms of sortability or searchability, then designing elements and attributes to meet your needs. When you have a good idea of your goals and how to mark up your content, you can build efficient elements and attributes.
From that point, careful tagging is all you need to create well-formed and valid XML. Sign in or register to add and subscribe to comments.
All Rights Reserved. Powered by W3. Are we done yet? Not quite.
When you view the XML document in Firefox, you should see something similar to the result pictured in Figure 2. Internet Explorer interprets the result as HTML code, even when the style sheet clearly specifies that it will output text. As a result, whitespace is collapsed and our whole document appears on one line.
For this reason, it is not yet practical to rely on browser support for XSLT in a real-world website. You should see something similar to Figure 2. What happens if you need to transform your own XML document into an XML document that meets the needs of another organization or person? Not to worry — XSLT will save the day! You see, Web browsers only supply collapsible tree formatting for XML documents without style sheets.
XML documents that result from a style sheet transformation are displayed without any styling at all, or at best are treated as HTML — not at all the desired result. There are several things that need to be added to your style sheet to signal to the browser that the document is more than a plain XML file, though. Here we have declared a default namespace for tags without prefixes in the style sheet. Next up, we can flesh out the output element to more fully describe the output document type: In addition to the method and indent attributes, we have specified a number of new attributes here: Internet Explorer for Windows displays XHTML documents in Quirks Mode when this declaration is present, so by omitting it we can ensure that this browser will display it in the more desirable Standards Compliance mode.
The rest of the style sheet is as it was for the HTML output example we saw above.
Now, we need to identify exactly what we need for our news items, binary files, and Web copy. We must also manage and track site administrators using XML. Compared to our article content type, news will be fairly straightforward. We will need to track these pieces of information:. The easiest way to keep track of copy is to treat each piece a little like an article. An XML document that tracks a piece of Web copy will look like this:.
We will need to keep track of each administrator on the site, as these are the folks who can log in and make changes to advertisement copy, articles, news pieces, and binary files. After that, you should have enough of a working knowledge of XML and its wacky family to really start development. In fact, in many contexts, consistency can be a very beautiful thing.
Remember that XML allows you to create any kind of language you want. In many cases, as long as you follow the rules of well-formedness, just about anything goes in XML. However, there will come a time when you need your XML document to follow some rules — to pass a validity test — and those times will require that your XML data be consistently formatted.
What we need is a way to enforce that kind of rule. In XML, there are two ways to set up consistency rules: A DTD document type definition is a tried and true if not old-fashioned way of achieving consistency. Each of these technologies contains lots of hidden nooks and crannies crammed with rules, exceptions, notations, and side stories.
Speaking of side stories, did you know that DTD actually stands for two things? It stands not just for document type definition, but also document type declaration.
The declaration consists of the lines of code that make up the definition. Just a warning before we start this chapter: As for the first question, many possible answers spring to mind:. Using a system to ensure consistency allows your XML documents to interact with all kinds of applications, contexts, and business systems — not just your own.
The way DTDs work is relatively simple. A DTD might look something like this:.
Those of you who are paying attention should have noticed some remarkable similarities between this DTD and the Letter to Mother example that we worked on in Chapter 2, XML in Practice. In fact, if you look closely, each line of the DTD provides a clue as to how our letter should be structured. This is called an element declaration. You can declare elements in any order you want, but they must all be declared in the DTD. A DTD element declaration consists of a tag name and a definition in parentheses.
These parentheses can contain rules for any of:. In this case, we want the letter element to contain, in order, the elements to , from , and message. As you can see, the sequence of child elements is comma-delimited. In fact, to be more precise, the sequence not only specifies the order in which the elements should appear, but also, how many of each element should appear.
In this case, the element declaration specifies that one of each element must appear in the sequence. If our file contained two from elements, for example, it would be as invalid as if it listed the message element before to.
How will you do that? With a neat little system of notation, defined in Table 3. After the letter declaration, we see these three declarations: So whenever you see this notation in a DTD, you know that the element must contain only text. This notation allows the paragraph element to contain any combination of plain text and b , i , u , and highpriority elements. Note that with mixed content like this, you have no control over the number or order of the elements that are used.
What about elements such as the hr and br , which in HTML contain no content at all? These are called empty elements, and are declared in a DTD as follows:. Remember attributes? An attribute declaration is structured differently than an element declaration.
For one thing, we define it with! Also, we must include in the declaration the name of the element that contains the attribute s , followed by a list of the attributes and their possible values. Basically, this attribute can contain any string of characters or numbers.
In DTD-speak, this means that the attribute is optional. Instead of allowing any arbitrary text, however, the DTD limits the values to either male or female. If, in our document, an actor element fails to contain a gender attribute, or contains a gender attribute with values other than male or female , then our document would be deemed invalid.
The actorid attribute has been designated an ID. In DTD-speak, an ID attribute must contain a unique value, which is handy for product codes, database keys, and other identifying factors. In our example, we want the actorid attribute to uniquely identify each actor in the list. The ID type set for the actorid attribute ensures that our XML document is valid if and only if a unique actorid is assigned to each actor.
Incidentally, if you want to declare an attribute that must contain a reference to a unique ID that is assigned to an element somewhere in the document, you can declare it with the IDREF attribute type. An entity is a piece of XML code that can be used and reused in a document with an entity reference. There are different types of entities, including general, parameter, and external. General entities are basically used as substitutes for commonly-used segments of XML code.
For example, here is an entity declaration that holds the copyright information for a company:. Parameter entities are both defined and referenced within DTDs. What this says is that each of the elements paragraph , intro , sidebar , and note can contain regular text as well as b , i , u , citation , and dialog elements. Not only does the use of a parameter entity reduce typing, it also simplifies maintenance of the DTD.
External entities point to external information that can be copied into your XML document at runtime. For example, you could include a stock ticker, inventory list, or other file, using an external entity. An external DTD is usually a file with a file extension of. First, you must edit the XML declaration to include the attribute.
This will search for the letter. If the DTD lives on a Web server, you might point to that instead:. Finally, XML Schema provides very fine control over the kinds of data contained in an element or attribute. Now, for some major drawbacks: Most of the criticism aimed at XML Schema is focused on its complexity and length.
Okay, now you know a lot more about DTDs than you did before. The first thing you do is you take a look at the dozens of corporate memos you and your colleagues have received in the past few months. After a day or two of close examination, a pattern emerges. Although your first impulse might be to run out and create a sample XML memo document, please resist that urge for now. Because these memos are internal to the company, and there may be a need for a separate external memo DOCTYPE, you decide to use internalmemo as your root element name:.
The first element — the root element — is internalmemo. This element will contain all the other elements, which hold date, sender, recipient, subject line, and all other information. Because these represent a lot of elements, it would be useful to split your document into two logical partitions: The header will contain recipient, subject line, date, and other information.
The body will contain the actual text of the memo. In DTD syntax, the above declaration states that our internalmemo element must contain one header element and one body element. Next, we will indicate which elements these will contain.
In DTD syntax, the above declaration states that the header element must contain single date , sender , and recipients elements, an optional blind-recipients element, and then a subject element. In DTD syntax, the above declaration states that the body element must contain one or more para elements, followed by a single sig element.
Most of the other elements will contain plain text, except the para elements, in which we will allow bold and italic text formatting. That was simple enough. Those pieces of information are hardly ever displayed on a document — they are used only for administrative purposes.
In any case, we want to be able to control the data that document creators put in for values such as priority. The best way to store these pieces of information is to add them as attributes to the root element. To do that, we need to add an attribute declaration to our DTD:. The result should look a lot like Figure 3. Do you see how, under Results, it reads No errors or warnings found.?
In Dreamweaver MX , the results list for a valid document is simply empty, and the status bar beneath the list reads Complete. What happens if some things are out of place? What would happen then? Notice that Dreamweaver MX tells you where the problem lies with a specific line number and provides a description of the problem. The validator catches that too, as you can see in Figure 3. Figure 3. Error resulting from a misplaced element. Again, the validator gives you a line number and a description that can lead you to resolve the problem.
All you need to do is put the sender element back in the prescribed order, and the document will validate once more. In that case, we embedded the DTD right into the file. You now have a reusable DTD that you can apply to other internal memos. We now understand articles, news stories, binary files, and Web copy, and are well on our way to completing the requirements-gathering phase of the project — we can start coding soon!
If you recall, we are tracking author, status, keyword, and other vital information in separate files. That is, each individual article, news story, binary file, and Web copy file keeps track of its own keywords, status, author, and dates.
If we wanted to display all documents for a certain author, we would have to dig through all of our files to find all the matches. Never fear — I have a proposal that will solve this problem.
In fact, the rest of this chapter will be devoted to tackling this issue.
With any luck, it will also give you some insights into the ways in which you can analyze requirements and come up with more architecturally sound XML designs.
The other problem is a little less obvious. To our application, these three names are different, and articles will thus be listed under three different authors. To solve this problem, we should create a separate author listing authors. Once we have this figured out, we can get rid of the author element in all the other content types, and replace them with an authorid elements.
Handling our authors this way also allows us to track other information about authors, such as their email addresses, their bylines in case they want to publish under pseudonyms , and other such information. Instead of a separate author element, we would add an authorid element to our articles, like this: All we need to do is use this author ID in our articles, news stories, and all other content we add to our CMS; this ID is used to look up the author and retrieve the information we need.
The big question remains: To be completely honest, most articles, news stories, and such will be submitted to the site through our administrative tool. This tool will have the necessary forms that will restrict data entry to certain fields.
In other words, our administrative tool will do most of the work of validating our content. However, I think it would be good practice to develop a DTD for our article content type — after all, this is one of the most important document types we have in our system, and it has to be done right.
Although we have declared our body element to contain character data, our article bodies will indeed be formatted using HTML tags. Try writing DTDs for these as well.
We used it to transform an XML letter to mother into something that could be displayed in a browser window. XPath is used in a variety of applications and technologies, however, XSLT is where its power and versatility really shine. For all intents and purposes, XPath is a query language. It uses a simple notation that is very similar to directory paths hence the name XPath.
When we put together a template, we normally use XPath to establish a match. For example, we can always handle the root of an XML document like this:. With XPath, you can select all elements that have a particular tag name. Or, you could match certain elements depending on their location within an XML file.
As you can see, the basic XPath syntax looks a lot like a file path on your computer. But you can go a step further and set conditions on which elements are matched within your specified path. These conditions are called predicates , and appear within square brackets following the element name you wish to set conditions for. The symbol identifies priority in this example as an attribute name, not a tag name. XPath also has a number of useful functions built in. For example, if you need to grab the first or last element of a series, you can use XPath to do so.
Although most practical applications are relatively simple, XPath can get quite twisty when it needs to be.
The XPath Recommendation is quite a useful reference to these areas of complexity. Book chapters provide an excellent opportunity to understand the arbitrary complexity of most XML documents. From the perspective of an XML document designer, however, a book chapter can be intimidatingly complex.