6.9. XML and XML-RPC

I decided to do a light coverage of chapter 8.

  1. XML documents are parsed in a similar way to HTML documents, but are easier to parse.
  2. XML-RPC is a very interesting topic, but it is not that much related to web technologies. I’ll make some comments about XML-RPC below.

6.9.1. XML

The first couple pages of chapter 8 give a nice overview of what XML is and how it can be used.

XML makes it easy for two computers to communicate complicated information. For example, consider what could happen when you go to the store and buy something. When the item you are buying is scanned at the checkout counter, if the store’s inventory system notices that they are starting to get low on that item, then the store’s computer may generate an XML document ordering more of that item. That document would be sent over the network to the regional warehouse. If the warehouse is low on the item, an order in XML format will be sent to the supplier’s order taking computer. Since this is a common item for the store to purchase, no human intervention is needed. The first people to even know of the order are folks loading the trucks. Their computer prints out that some predetermined quantity (a few cases or a pallet) of the item should be loaded on a truck headed to the store. The workers at the store may not be aware of the order until the truck arrives at their unloading dock.

The reason that XML can do such amazing tasks is because the XML tags describe the type, meaning or context of the data rather than how it should look on the screen. XML is all about content, not page layout as HTML describes.

You should be able to see that it is important that programs be able to generate XML files and be able to interpret (parse) them. The pages may be sent over the network using HTTP in the same way the HTML documents are transferred.

6.9.2. Parsing XML with DOM (Document Object Model)

The Text Book shows how an XML document can be parsed to a DOM tree and then the tree traversed to look for needed information. This is similar to what we saw with html5lib for parsing HTML documents. Take a quick look at the source code on pages 150 to 153.

Next, the book covers how to generate XML documents with DOM, which is straight forward, linear programing.

6.9.3. XML-RPC

Presumably, this topic was included here because it uses XML. But XML-RPC is much more about RPC than it is about XML. RPC stands for remote procedure call. The main idea is that an application on a client makes a local procedure call, which causes a message to be sent to a server that executes the procedure and returns a response.

6.9.3.1. RPC and Distributed Computing

The original RPC was a library of C routines. Sun Microsystems was the main developer of RPC and they used it to develop the NFS (network file system) for Unix. NFS is the most commonly used distributed file system used on Unix/Linux computers. Although the older versions of NFS are considered to have major security flaws, the performance of NFS is amazing. I have personally seen very slow computers attached to 10 Mb/s networks work as NFS file servers for huge volumes of files. Surprisingly, the performance, from the user’s perspective, was the same as working from a local hard drive. Dr. Jack Fegreus, Technology Director of Strategic Communications, confirms my casual observation with tests comparing NFS on Linux computer to Microsoft Windows SMBFS/CIFS distributed filesystem over a gigabit network. He concludes that, despite NFS being an older technology than Microsoft’s offering, NFS showed an approximate 2-to-1 performance advantage and being only slightly slower than local disk access. [FEGREUS09] Tests conducted by Dean Irvin in 2003, indicate that NFS out performs SMBFS and CIFS by close to 3-to-1. [IRVIN03]

RPC is only in C and is real a challenge (so I’ve been told) to program. So a currently active area of computer science research is to develop simple and efficient means of executing procedures on remote machines. This is sometimes referred to as distributed programming or distributed computing. Distributed computing can be used in a broad range of client/server applications. File and print servers can use it. It is used for parallel programming of problems that are too large to be solved on one computer. It is used anytime you have one control computer that controls multiple slave computers with resources beyond what could be put into one computer. When I worked at Sprint, I worked with a distributed computing system for telephone call processing.

XML-RPC is an interesting implementation, but it is not, by any means, the only project tackling the problem. Other similar attempts include: Java RMI, SOAP, CORBA and interesting Python on choice is called Python Remote Objects (Pyro). Which of these is the best largely depends on your criteria.

  • From a pure Python perspective, Pyro looks interesting, because it establishes remote objects rather than just execute remote functions as the others do. But Pyro only works with Python programs and it will not be able to solve larger distributed computing needs.
  • CORBA has the most features and thus is used quite a bit in industry. The criticism of CORBA is that it is too large and difficult to learn. My feeling is that what is needed are some simple abstractions that simplify the most common uses of CORBA.
  • The main appeal of XML-RPC is the simplicity of use. It uses XML and HTTP do it’s magic, but the programmer never does anything with either of these. Take a look at the example in the next section and you should see why XML-RPC is so popular in the Python community. There is certainly something to be said for being ease to use.

6.9.4. XML-RPC Example

Here is a simple example showing what XML-RPC can do. The sort of half–support of objects is interesting. Remember, the ‘P’ in RPC is for procedure. So what it provides is a set of functions (procedures). Some of those functions may be grouped inside a class definition, but the client can not treat an object on the server in the same way as it would one of its own objects. Grouping functions on the server into classes does provide class variables, which are one step up from global variable in this context.

Here is the server code:

import string
import SimpleXMLRPCServer

class StringFunctions(object):
    def _privateFunction(self):
        # This function cannot be called through XML-RPC because it
        # starts with an '_'
        pass

    def chop_in_half(self, astr):
        return astr[:len(astr)/2]

    def repeat(self, astr, times):
        return astr * times

    def set_str(self, astr):
        self.python_string = astr

    def get_str(self):
        return self.python_string

    def set_list(self, alist):
        self.python_list = alist

    def join(self):
        return ' '.join(self.python_list)

server = SimpleXMLRPCServer.SimpleXMLRPCServer(("localhost", 8000),
                                                allow_none = True)
server.register_instance(StringFunctions())
server.register_function(lambda astr: '_' + astr, '_string')
server.serve_forever()

The server registers functions with the SimpleXMLRPCServer object. We’ll see in the client code that the set of methods from the instantiated StringFunctions class become available. The allow_none option to the server is enabled because the set_str and set_list functions do not return any data.

Note

The lambda defines a single statement function object. See Explanation of less familiar Python statements from Topic 3 – Web.

# Client code
import xmlrpclib

server = xmlrpclib.Server('http://localhost:8000')
print server.chop_in_half('I am a confidant guy')
print server.repeat('Repetition is the key to learning!\n', 5)
print server._string('<= underscore')
server.set_str('I will be back')
print server.get_str()
server.set_list(['I', 'like it!'])
print server.join()
print server._privateFunction() # Will throw an exception

This is what gets displayed from the sever:

timlinux:~/np/Source_code/Topic_3-Web/xml_rpc> python locServ.py
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
localhost.localdomain - - [03/Jul/2009 19:46:09] "POST /RPC2 HTTP/1.0" 200 -
Traceback (most recent call last):
    server.serve_forever()
KeyboardInterrupt

So each request from the client resulted in a POST HTTP request to the server and an XML document returned with the returned data.

To stop the server, we send a KeyboardInterrupt, which should be Control-c in both Unix and Windows, but if Control-c fails in Windows, try Control-Break.

This is the output from running the client:

timlinux:~/np/Source_code/Topic_3-Web/xml_rpc> python locClient.py
I am a con
Repetition is the key to learning!
Repetition is the key to learning!
Repetition is the key to learning!
Repetition is the key to learning!
Repetition is the key to learning!

_<= underscore
I will be back
I like it!
Traceback (most recent call last):
    ...
xmlrpclib.Fault: <Fault 1: '<type \'exceptions.Exception\'>
:method "_privateFunction" is not supported'>

This example is obviously pretty simple, but shows that with very little overhead, a function can be defined to run on an XML-RPC server, which is on the other end of a network connection from our client.