A Guide to Peer-2-Peer

by Jim McKeeth

Description:
This paper provides a guide to Peer-2-Peer and how to develop applications to take advantage of this technology in Delphi using both Indy and specialized third party libraries.
Abstract:
Start with an introduction to Peer-2-Peer technology, different networks and how they work.  Then look at a simple Peer-2-Peer implementation with Indy. Then a more complex implementation. Finally an implementation of connecting to the Gnutella network with an existing 3rd party library.  
Legal:
Copyright © 2003 by Jim McKeeth, All Rights Reserved.

The most recent version of this paper, along with more examples and source code can be found here:

http://www.bsdg.org/jim/Peer2Peer/

Also join the Delphi P2P mailing list here:

http://www.yahoogroups.com/group/Delphi-P2P/

Delphi-P2P-subscribe@yahoogroups.com

Table of Contents

  1. Introduction
  2. Theory
    1. What Peer-2-Peer is not
      1. Star vs. Bus
      2. Server vs. Peer
      3. Peers vs. Clients
    2. What is Peer-2-Peer?
    3. History
    4. Uses
    5. Architectures
      1. Centralized
      2. Brokered
      3. Decentralized
        1. Equal Peer
        2. Super Peer
    6. Established Networks / Protocols
      1. Napster
      2. Gnutella
      3. Freenet
      4. WASTE
      5. IRC
    7. Firewalls
      1. Centralized Networks
      2. Brokered Networks
      3. Distributed Networks
  3. Application
    1. Simple Example in Delphi with Indy

Introduction

Peer-2-Peer is considered by many to be the hot new technology of the day, while others consider it to be a blight on the hard earned money of Hollywood, the music industry and other intellectual property owners.  While the legally creative uses of this technology is undoubtedly contributing to its popularity, this paper is focusing on the technology and some less controversial uses.  It is beyond the scope of this paper to debate the all other uses.  I leave the purpose, morality and legality of the technology and uses up to the implementers, users, rights advocates, courts and legislative bodies with the hope that a solution will be implemented to protect the rights of all.  It is not that I don't have an opinion, but this is not the place to express it.

This paper will focus on the technology, the history, and implementing this in Delphi with Indy and related technologies.  There are two main sections: Theory and Application.  The Theory section is language and tool agnostic and focuses on Peer-2-Peer in general.  The Application section focuses on Delphi, Indy and related tools.  Although the examples are given in Delphi with Indy, it is hoped that someone who is sufficiently skilled in the development arts could easily adapt these examples to other tools and languages.

[Table of Contents]

Theory

What Peer-2-Peer is not

If you already have a good grasp of network topologies then feel free to skip down a few sections.  The What is Peer-2-Peer section is where we really start to get into Peer-2-Peer information.  This information is presented to help you understand traditional networks and how they relate to Peer-2-Peer networks.

[Table of Contents]

Star vs. Bus

Before we get into what Peer-2-Peer technology is, lets talk a little bit about network topology.  In networking, topology is the physical or logical layout of the network.  Typical physical layout topologies include bus (linier), star (spokes off a hub) and less frequently ring (a bus with the ends joined).  To most network end users this physical layout is transparent and they are unaware of the layout.   This may be a bit of review for many of you, but it is important that the basics of network connectivity is fresh in your mind before we start talking about new and different ways to do things.

Physical

Star TopologyStar topology is probably the most common because of its resilience.  Every node (or PC) in a star topology network is connected directly to a central hub, switch or other connecting / routing device.  Because of the direct connection to the central control, if a cable goes out then only the one node is affected. The type of central connecting device determines how information is communicated on the network.  If a switch is used then all communication is routed only from the source to the destination.  If a hub is used then all communication is broadcast to every node in the network. The downside to a star topology is that bottlenecks can occur where all data is passing through the central hub.

Bus TopologyBus topology is less resilient, but is less expensive then star because no central hub and less cable are required.  There is a main trunk cable (the backbone) that provides the connectivity.  Each node is connected to the trunk cable.  Terminators are connected to the two ends of the trunk line.  If there is a break anywhere in the cable then the entire network is brought down.  Debugging it is very similar to the old string of Christmas lights where each node (bulb or PC) must be replaced or checked to find the source of the problem.  All communication is received by every node on the network.

Ring TopologyThe third primitive topology mentioned here is that of ring topology.  In ring topology each node is connected to two other nodes until an entire ring is formed.  Much like a bus topology where the terminators are removed and the two ends are connected.  All communication passes through each node on its way from the source to the destination.  If any node or cabling goes out, then the entire network goes out, just like with bus.  Installing and maintaining a ring topology can be complicated and expensive, but the results are a relatively high amount of bandwidth.  Another advantage is that since each node repeats the communication the network can be much larger then other topologies allow.  Ring topology is not to be confused with Token Ring topology, which actually uses a physical topology called star-wired ring topology.  What makes Token Ring unique is each node is passed a token in turn, and when it has the token it is given full control of the network - similar to thread scheduling.

  Tree TopologyIt is worth noting that star and bus topology can be, and commonly are combined to create complex or hybrid topologies.  These can include the combination of two of the same topologies, or the combination of the two different topologies.  When one or more star and a bus topologies are combined together it is typically referred to as a tree topology with the bus topology forming the "trunk" of the tree and star topologies branching off of it.  May times the server will be connected directly to the trunk to reduce the number of hubs that must be passed through to reach any given node.

Combined Star TopologyWhen two star topologies are combined together then two hubs (or switches are used).  This increases the number of hops necessary to travel from one segment to another.  There is a maximum limit to the number of hops (chained hubs) that can be used before performance suffers.  In this diagram to the left the server is attached to the hub on the right, giving it one hop access to the nodes on that same hub, but communication must pass through the second hub in order to reach the other segment which actually has more nodes.  

As a note, when a network is connected to an entirely separate network that is accomplished with a router.  For our purposes just imagine a router as a complex switch that decides if communication is destined for the local network (Local Area Network or LAN) or the external network (Wide Area Network or WAN).  Routers act as the gateways connecting LAN's to WAN's and other LAN's.

Logical

So far we have used star and bus to describe physical layouts, but they can also describe logical layouts.  The logical topology (or layout) is another layer on top of the physical one.  It is important to note that the logical layout can be different then the physical layout thanks to abstraction.  By building on the foundation laid at the physical level more complex and creative topologies are possible.   

True MeshMeshA new type of topology that we will introduce as a logical one is the Mesh topology.  Mesh is a highly redundant topology with multiple connections between each node.  The multiple connections allow for more then one way to reach any given node directly.  A true mesh would have a direct connection between every node and every other node.  One advantage of mesh topology is  that communication can take place directly between any two nodes without an intermediary.  Although the main advantage is that if a cable or node is knocked out then the network moves forward.  Only that which is directly removed is affected.  

Complex Mesh Adding one node to a True MeshMesh topologies could exist physically, especially on a small scale, but they are impractical because of the connectivity requirements.  Especially when you consider adding just one new node to a true mesh.  This process requires a new connection (or cable in a physical topology) from this one node to every existing node.  In our example to the left six new connections are required just to add the one new node.  

Mesh topologies get interesting when you have complex networks involving many nodes.  Imagine multiple mesh topologies combined into a single mesh topology.  At this point every node is no longer connected to every other node, but there are multiple routes between any two points.  This allows for entire sections of the network to fail without the remaining nodes loosing connectivity.

Simple example of the complex mesh that makes the InternetThis complex mesh topology is similar to the Internet, both at a physical and logical level.  At the physical level many nodes (especially hosting providers) have multiple connections to other nodes.  It is important to note that a single node is more often a router connected to multiple computers then just a single computer.  As mentioned above, it would be impractical to have every node (especially every PC) connected to every other node, but if nodes are joined together at certain points where there are multiple paths to further destinations, then a mesh becomes more practical.  

Think of it like the local roadway system where you live.  Your subdivision or neighborhood has a complex series of interconnected roads, but then just one or two roads out to the town.  Again in the town there is a complex series of interconnected roads, but just a few highways and freeway on ramps.  Once on the freeway you can travel to another town, and if the town is far enough away, there are multiple routes available for you to get there.  Think of the freeway on ramps as routers and the entrances to the neighborhoods as hubs.  

For our purposes, when discussion logical topologies we will refer to a bus (or linier) topology when you must pass through a series of intermediary nodes to get from point A to point B.  When it is possible to jump from a node to any other node thanks to a routing intermediary we will refer to that as star topology. And when you can jump directly from any single node to any other node that will be referred to as a mesh (or true mesh) topology.

For example the logical layout of the (TCP/IP) can be viewed as similar to a bus topology.  This is illustrated by running a trace route - with a connection from one point to another going through many points along the way. The fact that there were other routes is irrelevant to the individual trace route.  HTTP is an abstraction on top of TCP/IP which can be viewed as a star topology.  When you visit www.borland.com you do not need to pass through any other web sites (not at the http level) thanks to the routing intermediaries.  This is true of most high level TCP/IP protocols.  

Also in a logical layout the physical distance between nodes has less impact.  The travel time in close to the speed of light and it may appear to be a direct connection when in fact it is traveling through other physical nodes.  The distance traveled, and number of hops to get there, does have an impact on response or ping times.  Interestingly some ping times will be lower for much farther nodes then for closer ones.  

TCP/IP and the Protocols that ride it
The Internet with TCP/IP is very much a bus topology, with all connections passing through many other nodes to get from one point to another.  The physical network may be in a star, bus or other topology all-together, the underlying protocols takes care of that for us.  When we are using a protocol that rides on top of (or abstracts) TCP/IP, like HTTP, SMTP, etc. then we really don't worry about all this routing and underlying topology, we just accept that it will work.  So when we are using one of these riding protocols we are in a logical star topology.  The client just connects to the server.  As long as TCP/IP and the physical network are working as they should we don't need to be concerned with them while surfing the web or sending e-mails.  

Joel Spolsky of Joel on Software has a good article on The Law of Leaky Abstractions available here: http://www.joelonsoftware.com/articles/LeakyAbstractions.html

[Table of Contents]

Server vs. Peer

Typical network configurations include server and peer networks.  In server (or Client / Server) networks there are one or more central servers that choreograph all networking activity between all the machines.  There is a hierarchy with the server being in charge.  Thanks to technologies like DHCP a server may actually assign network address to the machines on the network.  Technologies like DNS allow one client to connect to the other using a common name without needing to know the machines actual address.  The presence of servers and more server technologies are more common in a business environment or when there are many computers.  This performs better and is easier for new clients to connect to as long as the central server(s) are operating correctly.  The server provides a central location allowing the entire network to be administered from one location (or at least that is the theory.)  The server and administrator do the extra work to make it easier on the clients and end users.  Servers are costly, both in time and money, so the more computers to be serviced by a server, the more cost effective it is to have one.

In a peer network all the machines are equal, with each machine discovering what other machines are on the network and obtaining an address without the help of a server.  A peer network does not require a special central server; this makes it more appealing to small networks, especially ones found in the home.  The downside is that it can take a while for a machine to find another machine on the network, and the burden of connecting correctly and obtaining an address is placed on each client.  If a peer network is going to be very large then an administrator may be required to keep things running smoothly, unfortunately there is no central server for them to administer, so they will be required to administer each machine on the network.  (Many network administrators will state that they already are physically administering each machine on the network, but the theory is the centralized servers make the administration easier).

There are also many variations and combinations of the above server / peer configurations.  Sometimes a router will provide DHCP type services to a peer network, which blurs the line between the two.  The main difference is that in a server network, the server is in charge and brokers the configuration, while in a peer network the peers are equal and they make up the network.

[Table of Contents]

Peers vs. Clients

The Internet started out with a bunch of always on computers that were both servers (provided content and services) and clients (consumed content and services).  If someone wanted to publish something, they published it on their machine (or one within their physical location).  Things were balanced because equal amounts of traffic flowed both directions - all the nodes were peers.

With the advent of the World Wide Web the masses became involved.  They used web browsers to consume without producing.  Thanks to their dial-up Internet access they became "transient" clients, coming and going without any commitment.  They were only connected for small periods of time and typically were connected at different addresses (thanks to DHCP).  Even if they did want to produce and serve content they were connected at such low speeds that they couldn't service many clients.  Eventually most consumption occurred by transients.  A  small group of powerful servers did all the production while a large group of transients clients did all the consumption.  If someone wanted to publish something they typically had to seek out a server located elsewhere - very rarely was it their machine, or even at their physical location.  The Internet evolved to suit this model of consumption.  

In recent history, always on broadband is swinging the tide back to the Internet's roots.  End users are connected most of the time, at higher speeds, and usually at the same address (maybe still using DHCP, but with longer leases.)  This, in combination with the growth in service providing transients adds new possibilities to the way the Internet works and how we, as end users communicate.

The Stack of Protocols we use
When we make use of the internet we use a lot of different protocols, all stacked on top of each other.  Here is a chart showing the protocols typically used when browsing the web.
Ethernet Network connections Star or Bus
PPP Typical in dial up Star
TCP/IP Routing Bus
HTTP Web pages Star

TCP/IP Illustrated, Volume 1 by Richard Stevens is recommended for more information about Internet protocols.

How does this relate to network topology?  The Internet started out with all the content providing machines at similar levels (ignoring routing intermediaries that provide services like DNS and DHCP).  This would be the peer model as mentioned earlier.  Gradually with time more transients connected to the network.  This moved to more of a server model, with multiple clients connecting to it.  The common relationship is Client to Server.  Although we are still on a bus topology at the lower level, the functional topology is closer to star; each client connects to the server and is agnostic to all the routers they pass through on the way. 

[Table of Contents]

What is Peer-2-Peer?

Tradition Client / Server Network activity Today with faster always on connections a new type of connectivity is on the rise.  This connectivity is between the individual clients.  Instead of calling it client to client we call it peer to peer (AKA Peer-to-Peer or P2P) since for these connections they are equal or peers, while client indicates subservient to a server.  The fact that a central server may have made the initial connection to the Internet (or some other network) possible is irrelevant in where the final activity takes place.  The outcome is that all the action is in the peers, or the fringes of the network.  If you want to publish something, you do so on your own machine, instead of on an external server.  Peer-2-Peer is a new abstraction on top of the current Internet structures.  

Peer-2-Peer Network activity The days of needing a server to host information for an end user are gone.  Client machines are now peers on the network, with everyone able to host and share their own information without dependence on centralized servers.

Distributed Network ActivityIt is important when defining Peer-2-Peer to remember "Peer-2-Peer is what Peer-2-Peer does."  In other words, don't try so hard to define it that the definition doesn't fit the applications. The key is the location of the activity - where things are interesting.  If all the interesting stuff happens in a central server, then it is not Peer-2-Peer.  If the activity is in the fringes, then it is Peer-2-Peer.  

By this definition, distributed applications like SETI@Home are Peer-2-Peer, and to a lesser extend so are chat systems (like ICQ, IRC, Jabber, etc.)  What distinguishes distributed applications from Peer-2-Peer applications is communication between peers.  Distributed applications communicate with a central server, and not with other peers.

Always remember: Peer-2-Peer is a technology, not an application.  Think of Peer-2-Peer the same way you think of your applications GUI or database functionality.  These are technology that allow your application to do its task.  Peer-2-Peer is a means to an end, not the destination.

[Table of Contents]

History

This doesn't attempt to be a complete history of Peer-2-Peer networking.  Such a history would also cover DNS, Usenet as well as other of the latest Peer-2-Peer technologies.  This technology continues to evolve quickly.

Internet Relay Chat (IRC) predates the popularization of  Peer-2-Peer by a long time.  The origin of IRC can be traced back to the Department of Information Processing Science at the University of Oulu ( http://www.oulu.fi/english/index.html ), Finland during the latter part of August 1988.  IRC consists of distributed servers that relay chat information between each other.  A set of these servers is called a net.  A user, or client, connects to one of these servers and joins a channel (like a chat room).  Once they are in a channel their chat is relayed to every other client in that channel on that net.  There are hundreds of established nets available and a given net can have thousands of channels.  

Napster The popularization of, and current innovation with Peer-2-Peer can be traced to Napster.  In May of 1999 Napster goes on-line offering end users the ability to connect and share their favorite music directly with other end users.  No worry about finding a server to upload the music too or the limits and rules that come with one. About this time broadband connections, mostly DSL, were very popular in people's homes.  Most everyone else had 56K dial-up connections.  Life was good for the music sharers.

Almost immediately Napster started to face legal challenges.  From a technical standpoint it was pointed out that since Napster had a single central server it could be shutdown or regulated.  At this point there were many clones of Napster.  Many were the result of reverse engineering the client and protocol for compatibility and improvements, while others were the same idea, "only better".  All had the same architecture: One central server with multiple clients connected to it.  The central server facilitated the client connections and the searches.  Once the desired song was found the server then facilitated a direct connection between the two clients, so they could make a peer to peer connection. 

In current news Napster is actually returning.  They are currently owned by Roxio and will begin offering a fee based music service similar to iTunes on October 29th. 

GnutellaThe next major news making innovation, and arguably the first mass accepted, true Peer-2-Peer technology is Gnutella.  Originally by Justin Frankel and Tom Pepper (of WinAmp / Nullsoft fame), Gnutella was invented and released on March 2000.  It didn't take long before employer AOL condemned and removed it as an "Unauthorized Freelance Project".  The motivation for such a move was most likely influenced by Time Warner, publisher of music, that was recently purchased by AOL.

 Fortunately for the Peer-2-Peer community the cat was out of the bag.  Gnutella didn't come with source code or any documentation, but a group of very talented individuals reverse engineered the protocol and started making new versions and improvements, releasing most all the work as open source.  The open source development really helped the evolution of the technology.  

KaZaa Many people were critical of Gnutella and argued it didn't scale well (and continue to argue) and searches could take a while since each node had to report the results independently.  About this time we see the rise of FastTrack.  FastTrack is not an application, but a technology for building other applications.  Sharman Networks' Kazaa and MusicCity's Morpheus are two such applications (although Morpheus later changed what it used).  Unlike Gnutella, FastTrack was a closed protocol with developers working for the company dedicated to developing it.  MusicCity isn't new to Peer 2 Peer networking; they were veterans from the Napster clone era.  FastTrack could be seen as combining Napster (fast searches) and Gnutella (more distributed architecture). It offers a Super Peer Decentralized network (see below).  

Freenet Also in the mix, based on a July 1999 paper, is Freenet.  Unlike all the other networks, Freenet does not fall under the traditional "file sharing" header.  Freenet is motivated to create a free speech network.  This results in a drastically different architecture that focuses on anonymity and not file searches.  Freenet offers many creative solutions to anonymity and persistence of information.  A "Democratic" network is used to determine what files remain on the network, and where they are located.  Freenet is still the in process of being realized and completed.  The latest release (as of November 2003) is version 0.6.  

Waste is the latest entry into the mass public Peer-2-Peer arena.  

In the Established Networks / Protocols section we will take a closer look at these networks and how they work. 

[Table of Contents]

Uses

So why would someone want to implement a Peer-2-Peer system beyond sharing their favorite music with friends?  A lot of the excitement with this technology is that it is such a change from the technologies of the time.  The thought is that since it is so "radical" then it must be useful, and who ever can be the first to find a use that is not legally challenged and generates revenue, wins the prize. 

Possible uses include:

[Table of Contents]

Architectures

Just as networks have specific topologies and architectures, so do different Peer-2-Peer systems.  As you will see, they are similar to the network topologies discussed earlier.

Centralized

Centralized The centralized system makes use of a single server that all communication passes through.  Think of this as a bus topology - all the nodes connect to the central hub.  Many may argue that this is not a Peer-2-Peer system at all since the peers do not communicate directly.  Common examples are SETI@Home or most chat programs.  The advantage of centralized is that the server is a known location and is easy to connect to.  Since everything passes through the server then caching can be used to increase performance.  There is no network fragmentation since there is only one level of connections.  The disadvantage is that communication is bottlenecked at the central server, and if that server goes down then the entire network goes down.

[Table of Contents]

Brokered

Brokered While still making use of a central server, a brokered system uses a small number of central servers to arrange for connections of peers.  This server may provide other services to aid in the matching of the peers.  Kind of like a dating service for computers.  Once the match is made then the individual nodes communicate directly.  Each node only ever knows about the central server and any other nodes that the central server introduces it too.  Beyond those few nodes the rest might as well not exist.  Examples of brokered include Napster and chat networks that allow direct connection.  The advantage of brokered is that it has the performance of centralized but also allows direct peer connections to relieve some of the bandwidth constraints of the central server while indexes can still be searched on the central server.  

[Table of Contents]

Decentralized

Decentralized This is the true Peer-2-Peer architecture - no servers, just peers.  A new node connects to an existing node that then introduces it to some of the other nodes.  No specific central server node coordinates the network.  Two separate decentralized systems could be joined by a single node that connects to both of them.  Once that node introduces nodes from the two systems to each other then more connections between the systems can be made.  Gnutella and FreeNet are examples of decentralized networks.  The advantage of decentralized is there is no weak point to take out the entire network at once.  It is very much a grassroots network.  The disadvantage is that a query of the network must make many hops to reach many nodes, which takes much longer.  If the network is very large then you will most likely never query the entire network.

[Table of Contents]

Equal Peer

If decentralized is true Peer-2-Peer, then equal peer decentralized is the truest and purest.  No node is any more important then any other, total equality on an architecture level.  Every node will be connected to a few other nodes both as equal peers (client and server).  

[Table of Contents]

Super PeerDecentralized - Super Peer

Super peer is a hybrid between brokered and decentralized, but would still be considered a sub-type to decentralized.  In a super peer system if a specific node has certain environmental advantages (good connection speed, high visibility and long uptime) then it acts as a super node.  As a super node it behaves like a mini brokered architecture server within a larger decentralized network.  Each peer (including super peers) would be connected to one or more other super peers.  If a single super peer is lost then the remaining super peers (that are typically already connected) pick up the slack until a new super peer takes its place.  So while remaining decentralized this has the advantages of a brokered system, although still not as fast.

[Table of Contents]

Established Networks / Protocols

There are many more systems then those that are or could be mentioned here.  These were chosen as common systems that provide an example of the architectures defined above.  Not that they are any better then many of the other systems out there.

[Table of Contents]

Napster

Napster Central Server Model Napster is a Brokered system.  Everyone connected to the central Napster server (may it rest in peace) and told the server what files it had available and how other peers could reach it.  Then when a node is looking for a specific file or peer it asks the central server which responds with a listing of files and/or server connection information.  At that point the two peers connect to each other to transfer the file.

[Table of Contents]

Gnutella

Gnutella Originally Gnutella was an Equal Peer Decentralized architecture.  Any new node could connect to any existing peer in the network and then have access to the network.  Once connected to a peer it then had access to every peer connected to that peer, which continues out in a ripple effect until the Time To Live (TTL) expires.  The time to live gives each query a finite life span of a specific number of jumps to prevent the entire network from being saturated by long living queries.  

With time Gnutella evolved to the Super Peer Decentralized architecture.  This helped answer the issues of scalability and slow search speeds.  With this system a new node connects to one or more super peers and once connected has access to all peers (and super peers) connected to that super peer, continuing out in a ripple patter until the time to live expires.  

Gnutella's Mesh Nature and Long Links Gnutella's major advantage over and difference from Napster era systems is there is no central server.  For a user to connect to a Gnutella network they only need to know the address of one other machine on the network.  Once that connection is made then that node will discover other nodes on the network until a few connections are made.  Ideally the connections should be as far apart as possible (see diagram to left).  The less hops for an alternative route between nodes the mode valuable the links between those nodes.  If any node in the network goes down, then the nodes that were connected to that one just connect to a different node.  This redundant nature is very similar to the basic redundant nature of the Internet itself.  Gnutella poor link setup

Since no central server is required, there can be multiple unconnected Gnutella networks.  This would allow for a corporation to setup a Gnutella network within their firewalls.  Then they could regulate what the network is used for and not need to worry about outsiders accessing it.  

[Table of Contents]

Freenet

As mentioned earlier, Freenet is unique among Peer-2-Peer systems.  Similar to Gnutella in communication architecture, but the content of the network does not stay put.  Each time a file is requested a copy is made on the nodes closer to the requesting node.  This makes it more convenient the next time it is requested from the same location.

[Table of Contents]

WASTE

The folks at Nullsoft are at it again with the release of WASTE.  Once again AOL removed their release shortly after it was made available.  This time the source code was also released under GPL and a project was established on SourceForge.net.  WASTE is a mesh-based workgroup tool that allows for RSA encrypted communication between small groups workgroups of users. The network is actually a partial mesh, with every possible connection made, limited by firewalls and routers.  Communication is then routed over the network along the route of lowest latency, which allows communication between firewalled peers via a non firewalled peer.

[Table of Contents]

IRC

Typical IRC use for chatting is a distributed centralized network - all communication passes through a set of distributed centralized servers.  There is also a Client-to-Client Protocol available.  When using CTCP (Client To Client Protocol) for chatting, or DCC (Direct Client Connect) for transfers, two clients can connect directly together, bypassing the central servers.  Sometimes IRC networks will split, due to technical difficulties.  If clients are connected with CTCP or DCC then they are immune to such splits.  Using CTCP and DCC makes IRC into a brokered network, much like Napster, with the centralized server(s) introducing the peers to each other.

[Table of Contents]

Firewalls

Firewalls and routers are two different devices technologically, but they have a similar "one way" effect on network traffic.  Routers allow multiple internal machines to share a single IP address and connection out to the external network (Internet).  Firewalls are very specific in what they allow to pass into and out of a network.  Many times a router includes a firewall, or firewall functionality.  In most configurations both firewalls and routers allow most or all traffic to flow out of a network and little or no traffic to flow in.  For the purposes of the discussion here they will be referred to interchangeably.  The deep technical details of configuring and differentiating the two are beyond the scope of this paper.  

Firewall allowing one way traffic Firewalls provide many different types of security.  Typically, on a fairly relaxed network, any internal client can connect out to any external server on any port.  Some networks may restrict certain ports or even certain servers, but that is more of an administrative choice then a network technology issue.  Through the use of NAT (Network Address Translation) an external connection can be opened to a specific internal machine, without this configuration it is unusual for an external client to be able to connect to an internal server.  The downside of NAT is that it requires configuration of the firewall for each port and machine to receive a connection, and each port can only be assigned to one internal machine.  In this section we will assume that the firewalls allow all traffic to pass out of a network and no traffic to pass into the network, which is a very common configuration.  For most client server applications on the Internet this isn't a big deal, and is in fact a great feature to secure a network. 

Firewall Effects on Peer Connections For Peer-2-Peer where two peers are trying to make a connection through a firewall this is a big deal.  If the peer behind the firewall is connecting out to the other peer then this will work fine.  But if the peer outside the firewall is trying to connect through to the peer inside the firewall it will not be able to.  If both peers are behind different firewalls then neither will be able to connect directly to each other. 

There are a couple solutions to this dilemma beyond changing your network configuration.  The solutions depend on what type of Peer-2-Peer network we are working within. 

[Table of Contents]

Centralized

For a centralized network the individual peers do not connect directly to each other.  They all connect out to the central server.  As long as the central server is available as a server then there is no problem.  So the central server could be behind the same firewall as all the clients (as seen in a corporate LAN setup).  If the server were behind another firewall, then the firewall would need to be configured to allow external traffic to connect to it.

[Table of Contents]

Brokered

Traversing a Firewall with Push Requests For a brokered network, the configuration of the central server would need to be the same as in a centralized network.  When it is necessary to connect to an individual peer (client) from outside a firewall then the peer would need to route a "push" request through the central server to the target peer.  When a peer receives a push request it then connects out to the peer that is trying to connect to it, reversing the direction of the connection. Most protocols (e.g. Gnutalla, Napster, and FastTrack) use a method similar to this to transverse firewalls.   

If both peers are behind different firewalls then an intermediary is needed.  This intermediary would tunnel the connection between the two peers.  The intermediary could be a central server or a peer that both other peers can connect to.  Many networks do not use the intermediary since that intermediary could end up consuming a large portion of the peer's bandwidth.  

Distributed

Very similar to brokered except instead of sending push requests through a central server, the request is sent through the peers that the query traveled through.  

Freenet is an example of a distributed network that sends file transfers through intermediaries.  In fact all transfers are through intermediaries.  This provides the desired anonymity that is crucial to the design goals of Freenet.  Since the requester never connects to the source the requester never knows where the file originally came from, protecting their identity.  The downside is speed - it takes much longer to move it though all the nodes instead of transmitting it directly.

Routed

For networks that support routing (Freenet and Waste), firewalls are transversed much easier.

[Table of Contents]

Application

The programming examples will concentrated on mostly on distributed, but will also look at brokered.  

[Table of Contents]

Simple Example in Delphi with Indy

The simplest Peer-2-Peer application will have a single client socket (connecting out) and a single server socket (accepts incoming connections).  A server socket can accept multiple incoming connections while a client socket can only make a single out going connection, so the next more advanced would application would have multiple client sockets and a single server socket.  

[Table of Contents]

Peer-2-PeerChat

For this first example we will use one TIdTCPClient and TIdTCPServer socket.  The simplest p2p chat would only allow a total of two chatters.  Not very exciting.  This example will allow a unlimited (theoretically) number of nodes.  It will be completely decentralized with all chat's going through the intermediaries - new peer connections.  Each message will be broadcast to the entire group and each node will re-broadcast the message if it has not previously broadcast it.  There is no Time To Live so each message will travel the entire network regardless of size.

p2pChat formTake a form named p2pChatForm, add a TIdTCPClient named ClientSocket and TIdTCPServer socket named ServerSocket.  Also include an TIdAntiFreeze to keep the UI from freezing.  Arrange 3 TEdits, a TCheckBox, 2 TButtons, 2 TMemos, a TTimer and a a TStatusBar with two panels so it looks something like you see to the right.  Name the TEdits ListenEdit, HostEdit and NickEdit.  Name the TCheckBox ListenCheck.  Name the TMemos LogMemo (on top) and EditMemo (on bottom). Name your TButtons SendBtn and OutBtn.  Rename the TTimer to ClientPoll and set the interval to 100 milliseconds.  Finally name the TStatusBar StatusBar. Use TSplitters and TPanels to get the layout you want.  

First lets add a routine to handle received messages:

procedure Tp2pChatForm.Received(const s: string);
// processes a received string
begin
// ignore null strings and ones already received or broadcast
if (s <> '') and (LogMemo.Lines.IndexOf(s) = -1) then
  begin
// Show it in the log
 LogMemo.Lines.Add(s);
// Broadcast to everyone
 DoBroadcast(s);
end;
end;

This routine takes a single string as a parameter.  If that string is empty or already displayed in the LogMemo then it is ignored.  Otherwise the string is added to the LogMemo and then Broadcast on all connections.  It is important to check for empty strings as we will see in the client polling.  Instead of separately tracking received messages we just check to see if it is displayed in the log.  All messages will have the sender's nickname as well as the time the message is sent to reduce the odds of a duplicate.  This isn't a perfect implementation, but will suffice for our demo.

Now for the Broadcast routine:

procedure Tp2pChatForm.DoBroadcast(const s: string);
// broadcast a message to all connected nodes
var
 idx: Integer;
 lList: TList;
begin
// send it out on the client connection if connected
if ClientSocket.Connected then
 ClientSocket.WriteLn(s);
// Send it to all clients connected to the server socket
 lList := ServerSocket.Threads.LockList;
try
    for idx := 0 to pred(lList.Count) do
 TIdPeerThread(lList.Items[idx]).Connection.WriteLn(s);
finally
 ServerSocket.Threads.UnlockList;
end;
end;

This routine also takes a single string parameter.  If our ClientSocket has a connection going out then we send that string out on that connection.  Next we obtain a lock on the Peer Thread list for the ServerSocket by calling the LockList method, which returns the list of threads.  We iterate through the threads and write the string to each one.  Finally we unlock the list.  Since we don't actually have a new instance of the list we DO NOT free it.

Handling incoming messages on a messages on a TIdTCPServer socket is as simple as assigning an OnExecute event handler that calls our routine to handle received messages.

procedure Tp2pChatForm.ServerSocketExecute(AThread: TIdPeerThread);
begin
// when something comes into the server, process the line
 Received(AThread.Connection.ReadLn());
end;

But what about the TIdTCPClient sockets?  Client sockets are typically outgoing sockets and do not have an event for incoming data.  This is what our ClientPoll TTimer is accomplishes on its OnTimer event.

procedure Tp2pChatForm.ClientPollTimer(Sender: TObject);
// Check to see if there is anything coming down
// on the ClientSocket
begin
 ClientPoll.Enabled := False;
try
    if ClientSocket.Connected then
    try
// try to process a line, but timeout after 100 milliseconds
 Received(ClientSocket.ReadLn(#$A,100));
except
// Disconnect on error
 ClientSocket.DisconnectSocket;
raise;
end;
finally
 ClientPoll.Enabled := True;
end;
end;

By specifying a TimeOut when calling ReadLn on the TIdTCPClient we can just read data if it is available.  If no data is available then Readln returns an empty string.  This is why it is important that we ignored empty strings in our Received routine.  

The remainder of the application is fairly straightforward and I will leave it to you to explore the accompanying source code.  A couple notes on the program.  Because we use the LogMemo to see if we previously broadcast a message it is important that we have WordWrap turned off because that will change the line if it is longer then one line.  The solution to this would be to keep a separate log of processed messages (with better identifiers!)  Also, this application can connect to itself if you want, or you can run multiple instances of it on the same machine.

[Table of Contents]

Connecting to Existing Networks

Gnucleus DNA

http://www.gnucleus.com/GnucDNA/home.html

Gnucleus DNA is a COM component that provides all the behind the scenes support to creating a Gnutella client.  The first example just covers some of the basics for using this component.

Gnuminous / GnucDNA Delphi

http://gnucdnadelphi.sourceforge.net/

Gnuminous is a full fledged Gnutella client written in Delphi with Gnucleus DNA.  We take a look at some of the code and what it would take to build our own Gnucleus DNA client.

GPU GPU logo

http://gpu.sourceforge.net/

GPU is an open source (GPL) Gnutella client for sharing files and CPU-resources with a goal of building a supercomputer.  Pure Delphi implementation.

MsgConnectMessage Connect RJ-45 Hydra

http://www.msgconnect.com/

MsgConnect is a library available in pure Delphi, or a number of other languages (C++, Java, C#) for building Peer-to-Peer applications on the Windows, Linux, Java and Palm platforms.  An ActiveX and DLL are also provided for other non-native supported languages.  MsgConnect is a commercial library provided by EldoS, and can be licensed under GPL (thus making your application GPL) or a standard commercial license.  

[Table of Contents]