|
Peer-2-Peer is considered by many to be the hot new technology of the day, while others consider it to be a blight on the hard earned money of Hollywood, the music industry and other intellectual property owners. While the legally creative uses of this technology is undoubtedly contributing to its popularity, this paper is focusing on the technology and some less controversial uses. It is beyond the scope of this paper to debate the all other uses. I leave the purpose, morality and legality of the technology and uses up to the implementers, users, rights advocates, courts and legislative bodies with the hope that a solution will be implemented to protect the rights of all. It is not that I don't have an opinion, but this is not the place to express it.
This paper will focus on the technology, the history, and implementing this in Delphi with Indy and related technologies. There are two main sections: Theory and Application. The Theory section is language and tool agnostic and focuses on Peer-2-Peer in general. The Application section focuses on Delphi, Indy and related tools. Although the examples are given in Delphi with Indy, it is hoped that someone who is sufficiently skilled in the development arts could easily adapt these examples to other tools and languages.
If you already have a good grasp of network topologies then feel free to skip down a few sections. The What is Peer-2-Peer section is where we really start to get into Peer-2-Peer information. This information is presented to help you understand traditional networks and how they relate to Peer-2-Peer networks.
Before we get into what Peer-2-Peer technology is, lets talk a little bit about network topology. In networking, topology is the physical or logical layout of the network. Typical physical layout topologies include bus (linier), star (spokes off a hub) and less frequently ring (a bus with the ends joined). To most network end users this physical layout is transparent and they are unaware of the layout. This may be a bit of review for many of you, but it is important that the basics of network connectivity is fresh in your mind before we start talking about new and different ways to do things.
Star topology is probably the
most common because of its resilience. Every node
(or PC) in a star topology network is connected directly
to a central hub, switch or other connecting / routing
device. Because of the direct connection to the
central control, if a cable goes out then only the one
node is affected. The type of central connecting device
determines how information is communicated on the
network. If a switch is used then all communication
is routed only from the source to the destination.
If a hub is used then all communication is broadcast to
every node in the network. The downside to a star
topology is that bottlenecks can occur where all data is
passing through the central hub.
Bus topology is less resilient,
but is less expensive then star because no central hub and
less cable are required. There is a main trunk cable
(the backbone) that provides the connectivity. Each
node is connected to the trunk cable. Terminators
are connected to the two ends of the trunk line. If
there is a break anywhere in the cable then the entire
network is brought down. Debugging it is very
similar to the old string of Christmas lights where each
node (bulb or PC) must be replaced or checked to find the
source of the problem. All communication is received
by every node on the network.
The third primitive topology
mentioned here is that of ring
topology. In ring topology each node is
connected to two other nodes until an entire ring is
formed. Much like a bus topology where the
terminators are removed and the two ends are
connected. All communication passes through each
node on its way from the source to the destination.
If any node or cabling goes out, then the entire network
goes out, just like with bus. Installing and
maintaining a ring topology can be complicated and
expensive, but the results are a relatively high amount of
bandwidth. Another advantage is that since each node
repeats the communication the network can be much larger
then other topologies allow. Ring topology is not to
be confused with Token Ring topology, which actually uses
a physical topology called star-wired ring topology.
What makes Token Ring unique is each node is passed a
token in turn, and when it has the token it is given full
control of the network - similar to thread scheduling.
It is worth noting that star and
bus topology can be, and commonly are combined to create
complex or hybrid topologies. These can include the
combination of two of the same topologies, or the
combination of the two different topologies. When
one or more star and a bus topologies are combined
together it is typically referred to as a
tree topology with the bus topology
forming the "trunk" of the tree and star topologies
branching off of it. May times the server will be
connected directly to the trunk to reduce the number of
hubs that must be passed through to reach any given
node.
When two star
topologies are combined together
then two hubs (or switches are used). This increases
the number of hops necessary to travel from one segment to
another. There is a maximum limit to the number of
hops (chained hubs) that can be used before performance
suffers. In this diagram to the left the server is
attached to the hub on the right, giving it one hop access
to the nodes on that same hub, but communication must pass
through the second hub in order to reach the other segment
which actually has more nodes.
As a note, when a network is connected to an entirely separate network that is accomplished with a router. For our purposes just imagine a router as a complex switch that decides if communication is destined for the local network (Local Area Network or LAN) or the external network (Wide Area Network or WAN). Routers act as the gateways connecting LAN's to WAN's and other LAN's.
So far we have used star and bus to describe physical layouts, but they can also describe logical layouts. The logical topology (or layout) is another layer on top of the physical one. It is important to note that the logical layout can be different then the physical layout thanks to abstraction. By building on the foundation laid at the physical level more complex and creative topologies are possible.

A new type of topology that we will
introduce as a logical one is the
Mesh topology. Mesh is a
highly redundant topology with multiple connections
between each node. The multiple connections allow
for more then one way to reach any given node
directly. A true mesh
would have a direct connection between every node and
every other node. One advantage of mesh topology
is that communication can take place directly
between any two nodes without an intermediary.
Although the main advantage is that if a cable or node is
knocked out then the network moves forward. Only
that which is directly removed is
affected.
Mesh topologies could exist physically,
especially on a small scale, but they are impractical
because of the connectivity requirements. Especially
when you consider adding just one
new node to a true mesh. This process requires a new
connection (or cable in a physical topology) from this one
node to every existing node. In our example to the
left six new connections are required just to add the one
new node.
Mesh topologies get interesting when you have complex networks involving many nodes. Imagine multiple mesh topologies combined into a single mesh topology. At this point every node is no longer connected to every other node, but there are multiple routes between any two points. This allows for entire sections of the network to fail without the remaining nodes loosing connectivity.
This complex mesh topology is similar
to the Internet, both at a physical and logical
level. At the physical level many nodes (especially
hosting providers) have multiple connections to other
nodes. It is important to note that a single node is
more often a router connected to multiple computers then
just a single computer. As mentioned above, it would
be impractical to have every node (especially every PC)
connected to every other node, but if nodes are joined
together at certain points where there are multiple paths
to further destinations, then a mesh becomes more
practical.
Think of it like the local roadway system where you live. Your subdivision or neighborhood has a complex series of interconnected roads, but then just one or two roads out to the town. Again in the town there is a complex series of interconnected roads, but just a few highways and freeway on ramps. Once on the freeway you can travel to another town, and if the town is far enough away, there are multiple routes available for you to get there. Think of the freeway on ramps as routers and the entrances to the neighborhoods as hubs.
For our purposes, when discussion logical topologies we will refer to a bus (or linier) topology when you must pass through a series of intermediary nodes to get from point A to point B. When it is possible to jump from a node to any other node thanks to a routing intermediary we will refer to that as star topology. And when you can jump directly from any single node to any other node that will be referred to as a mesh (or true mesh) topology.
For example the logical layout of the (TCP/IP) can be viewed as similar to a bus topology. This is illustrated by running a trace route - with a connection from one point to another going through many points along the way. The fact that there were other routes is irrelevant to the individual trace route. HTTP is an abstraction on top of TCP/IP which can be viewed as a star topology. When you visit www.borland.com you do not need to pass through any other web sites (not at the http level) thanks to the routing intermediaries. This is true of most high level TCP/IP protocols.
Also in a logical layout the physical distance between nodes has less impact. The travel time in close to the speed of light and it may appear to be a direct connection when in fact it is traveling through other physical nodes. The distance traveled, and number of hops to get there, does have an impact on response or ping times. Interestingly some ping times will be lower for much farther nodes then for closer ones.
TCP/IP and the Protocols that ride it |
|
The Internet with TCP/IP is very much a bus
topology, with all connections passing through many
other nodes to get from one point to another.
The physical network may be in a star, bus or other
topology all-together, the underlying protocols
takes care of that for us. When we are using
a protocol that rides on top of (or abstracts)
TCP/IP, like HTTP, SMTP, etc. then we really don't
worry about all this routing and underlying
topology, we just accept that it will work.
So when we are using one of these riding protocols
we are in a logical star topology. The client
just connects to the server. As long as
TCP/IP and the physical network are working as they
should we don't need to be concerned with them
while surfing the web or sending
e-mails.
Joel Spolsky of Joel on Software has a good article on The Law of Leaky Abstractions available here: http://www.joelonsoftware.com/articles/LeakyAbstractions.html |
Typical network configurations include server and peer networks. In server (or Client / Server) networks there are one or more central servers that choreograph all networking activity between all the machines. There is a hierarchy with the server being in charge. Thanks to technologies like DHCP a server may actually assign network address to the machines on the network. Technologies like DNS allow one client to connect to the other using a common name without needing to know the machines actual address. The presence of servers and more server technologies are more common in a business environment or when there are many computers. This performs better and is easier for new clients to connect to as long as the central server(s) are operating correctly. The server provides a central location allowing the entire network to be administered from one location (or at least that is the theory.) The server and administrator do the extra work to make it easier on the clients and end users. Servers are costly, both in time and money, so the more computers to be serviced by a server, the more cost effective it is to have one.
In a peer network all the machines are equal, with each machine discovering what other machines are on the network and obtaining an address without the help of a server. A peer network does not require a special central server; this makes it more appealing to small networks, especially ones found in the home. The downside is that it can take a while for a machine to find another machine on the network, and the burden of connecting correctly and obtaining an address is placed on each client. If a peer network is going to be very large then an administrator may be required to keep things running smoothly, unfortunately there is no central server for them to administer, so they will be required to administer each machine on the network. (Many network administrators will state that they already are physically administering each machine on the network, but the theory is the centralized servers make the administration easier).
There are also many variations and combinations of the above server / peer configurations. Sometimes a router will provide DHCP type services to a peer network, which blurs the line between the two. The main difference is that in a server network, the server is in charge and brokers the configuration, while in a peer network the peers are equal and they make up the network.
The Internet started out with a bunch of always on computers that were both servers (provided content and services) and clients (consumed content and services). If someone wanted to publish something, they published it on their machine (or one within their physical location). Things were balanced because equal amounts of traffic flowed both directions - all the nodes were peers.
With the advent of the World Wide Web the masses became involved. They used web browsers to consume without producing. Thanks to their dial-up Internet access they became "transient" clients, coming and going without any commitment. They were only connected for small periods of time and typically were connected at different addresses (thanks to DHCP). Even if they did want to produce and serve content they were connected at such low speeds that they couldn't service many clients. Eventually most consumption occurred by transients. A small group of powerful servers did all the production while a large group of transients clients did all the consumption. If someone wanted to publish something they typically had to seek out a server located elsewhere - very rarely was it their machine, or even at their physical location. The Internet evolved to suit this model of consumption.
In recent history, always on broadband is swinging the tide back to the Internet's roots. End users are connected most of the time, at higher speeds, and usually at the same address (maybe still using DHCP, but with longer leases.) This, in combination with the growth in service providing transients adds new possibilities to the way the Internet works and how we, as end users communicate.
The Stack of Protocols we use |
||||||||||||
When we make use of the internet we use a lot
of different protocols, all stacked on top of
each other. Here is a chart
showing the protocols typically used when
browsing the web.
TCP/IP Illustrated, Volume 1 by Richard Stevens is recommended for more information about Internet protocols. |
How does this relate to network topology? The Internet started out with all the content providing machines at similar levels (ignoring routing intermediaries that provide services like DNS and DHCP). This would be the peer model as mentioned earlier. Gradually with time more transients connected to the network. This moved to more of a server model, with multiple clients connecting to it. The common relationship is Client to Server. Although we are still on a bus topology at the lower level, the functional topology is closer to star; each client connects to the server and is agnostic to all the routers they pass through on the way.
Today with faster always on connections
a new type of connectivity is on the rise. This
connectivity is between the individual clients.
Instead of calling it client to client we call it peer to
peer (AKA Peer-to-Peer or P2P) since for these connections
they are equal or peers, while client indicates
subservient to a server. The fact that a central
server may have made the initial connection to the
Internet (or some other network) possible is irrelevant in
where the final activity takes place. The outcome is
that all the action is in the peers, or the fringes of the
network. If you want to publish something, you do so
on your own machine, instead of on an external
server. Peer-2-Peer is a new abstraction on top of
the current Internet structures.
The days of needing a server to host
information for an end user are gone. Client
machines are now peers on the network, with everyone able
to host and share their own information without dependence
on centralized servers.
It is important when defining Peer-2-Peer to
remember "Peer-2-Peer is what Peer-2-Peer does." In
other words, don't try so hard to define it that the
definition doesn't fit the applications. The key is the
location of the activity - where things are
interesting. If all the interesting stuff happens in
a central server, then it is not Peer-2-Peer. If the
activity is in the fringes, then it is
Peer-2-Peer.
By this definition, distributed applications like SETI@Home are Peer-2-Peer, and to a lesser extend so are chat systems (like ICQ, IRC, Jabber, etc.) What distinguishes distributed applications from Peer-2-Peer applications is communication between peers. Distributed applications communicate with a central server, and not with other peers.
Always remember: Peer-2-Peer is a technology, not an application. Think of Peer-2-Peer the same way you think of your applications GUI or database functionality. These are technology that allow your application to do its task. Peer-2-Peer is a means to an end, not the destination.
This doesn't attempt to be a complete history of Peer-2-Peer networking. Such a history would also cover DNS, Usenet as well as other of the latest Peer-2-Peer technologies. This technology continues to evolve quickly.
Internet Relay Chat (IRC) predates the popularization of Peer-2-Peer by a long time. The origin of IRC can be traced back to the Department of Information Processing Science at the University of Oulu ( http://www.oulu.fi/english/index.html ), Finland during the latter part of August 1988. IRC consists of distributed servers that relay chat information between each other. A set of these servers is called a net. A user, or client, connects to one of these servers and joins a channel (like a chat room). Once they are in a channel their chat is relayed to every other client in that channel on that net. There are hundreds of established nets available and a given net can have thousands of channels.
The popularization of, and current
innovation with Peer-2-Peer can be traced to
Napster. In May of 1999 Napster
goes on-line offering end users the ability to connect and
share their favorite music directly with other end
users. No worry about finding a server to upload the
music too or the limits and rules that come with one. About
this time broadband connections, mostly DSL, were very
popular in people's homes. Most everyone else had 56K
dial-up connections. Life was good for the music
sharers.
Almost immediately Napster started to face legal challenges. From a technical standpoint it was pointed out that since Napster had a single central server it could be shutdown or regulated. At this point there were many clones of Napster. Many were the result of reverse engineering the client and protocol for compatibility and improvements, while others were the same idea, "only better". All had the same architecture: One central server with multiple clients connected to it. The central server facilitated the client connections and the searches. Once the desired song was found the server then facilitated a direct connection between the two clients, so they could make a peer to peer connection.
In current news Napster is actually returning. They are currently owned by Roxio and will begin offering a fee based music service similar to iTunes on October 29th.
The next major news making innovation,
and arguably the first mass accepted, true Peer-2-Peer
technology is Gnutella. Originally by Justin
Frankel and Tom Pepper (of WinAmp / Nullsoft fame), Gnutella was
invented and released on March 2000. It didn't take
long before employer AOL condemned and removed it as an
"Unauthorized Freelance Project". The motivation
for such a move was most likely influenced by Time Warner,
publisher of music, that was recently purchased by AOL.
Fortunately for the Peer-2-Peer community the cat was out of the bag. Gnutella didn't come with source code or any documentation, but a group of very talented individuals reverse engineered the protocol and started making new versions and improvements, releasing most all the work as open source. The open source development really helped the evolution of the technology.
Many people were critical of Gnutella and
argued it didn't scale well (and continue to argue) and
searches could take a while since each node had to report
the results independently. About this time we see
the rise of FastTrack. FastTrack is not an
application, but a technology for building other
applications. Sharman Networks' Kazaa and
MusicCity's Morpheus are two such applications (although
Morpheus later changed what it used). Unlike
Gnutella, FastTrack was a closed protocol with developers
working for the company dedicated to developing it.
MusicCity isn't new to Peer 2 Peer networking; they were
veterans from the Napster clone era. FastTrack could
be seen as combining Napster (fast searches) and Gnutella
(more distributed architecture). It offers a
Super Peer Decentralized network
(see below).
Also in the mix, based on a July 1999
paper, is Freenet. Unlike all the other
networks, Freenet does not fall under the traditional "file
sharing" header. Freenet is motivated to create a free
speech network. This results in a drastically
different architecture that focuses on anonymity and not
file searches. Freenet offers many creative solutions
to anonymity and persistence of information. A
"Democratic" network is used to determine what files remain
on the network, and where they are located. Freenet is
still the in process of being realized and completed.
The latest release (as of November 2003) is version
0.6.
Waste is the latest entry into the mass
public Peer-2-Peer arena.
In the Established Networks / Protocols section we will take a closer look at these networks and how they work.
So why would someone want to implement a Peer-2-Peer system beyond sharing their favorite music with friends? A lot of the excitement with this technology is that it is such a change from the technologies of the time. The thought is that since it is so "radical" then it must be useful, and who ever can be the first to find a use that is not legally challenged and generates revenue, wins the prize.
Possible uses include:
Just as networks have specific topologies and architectures, so do different Peer-2-Peer systems. As you will see, they are similar to the network topologies discussed earlier.
The centralized system makes use of a
single server that all communication passes through.
Think of this as a bus topology - all the nodes connect to
the central hub. Many may argue that this is not a
Peer-2-Peer system at all since the peers do not
communicate directly. Common examples are
SETI@Home or most chat programs. The advantage of
centralized is that the server is a known location and is
easy to connect to. Since everything passes through
the server then caching can be used to increase
performance. There is no network fragmentation since
there is only one level of connections. The
disadvantage is that communication is bottlenecked at the
central server, and if that server goes down then the
entire network goes down.
While still making use of a central
server, a brokered system uses a small number of central
servers to arrange for connections of peers. This
server may provide other services to aid in the matching
of the peers. Kind of like a dating service for
computers. Once the match is made then the
individual nodes communicate directly. Each node
only ever knows about the central server and any other
nodes that the central server introduces it too.
Beyond those few nodes the rest might as well not
exist. Examples of brokered include Napster and chat
networks that allow direct connection. The advantage
of brokered is that it has the performance of centralized
but also allows direct peer connections to relieve some of
the bandwidth constraints of the central server while
indexes can still be searched on the central
server.
This is the true Peer-2-Peer
architecture - no servers, just peers. A new node
connects to an existing node that then introduces it to
some of the other nodes. No specific central server
node coordinates the network. Two separate
decentralized systems could be joined by a single node
that connects to both of them. Once that node
introduces nodes from the two systems to each other then
more connections between the systems can be made.
Gnutella and FreeNet are examples of decentralized
networks. The advantage of decentralized is there is
no weak point to take out the entire network at
once. It is very much a grassroots network.
The disadvantage is that a query of the network must make
many hops to reach many nodes, which takes much
longer. If the network is very large then you will
most likely never query the entire network.
If decentralized is true Peer-2-Peer, then equal peer decentralized is the truest and purest. No node is any more important then any other, total equality on an architecture level. Every node will be connected to a few other nodes both as equal peers (client and server).

Super peer is a hybrid between brokered and decentralized, but would still be considered a sub-type to decentralized. In a super peer system if a specific node has certain environmental advantages (good connection speed, high visibility and long uptime) then it acts as a super node. As a super node it behaves like a mini brokered architecture server within a larger decentralized network. Each peer (including super peers) would be connected to one or more other super peers. If a single super peer is lost then the remaining super peers (that are typically already connected) pick up the slack until a new super peer takes its place. So while remaining decentralized this has the advantages of a brokered system, although still not as fast.
There are many more systems then those that are or could be mentioned here. These were chosen as common systems that provide an example of the architectures defined above. Not that they are any better then many of the other systems out there.
Napster is a Brokered system.
Everyone connected to the central Napster server (may it
rest in peace) and told the server what files it had
available and how other peers could reach it. Then
when a node is looking for a specific file or peer it asks
the central server which responds with a listing of files
and/or server connection information. At that point
the two peers connect to each other to transfer the
file.
Originally Gnutella was an Equal Peer
Decentralized architecture. Any new node could
connect to any existing peer in the network and then have
access to the network. Once connected to a peer it
then had access to every peer connected to that peer,
which continues out in a ripple effect until the Time To
Live (TTL) expires. The time to live gives each
query a finite life span of a specific number of jumps to
prevent the entire network from being saturated by long
living queries.
With time Gnutella evolved to the Super Peer Decentralized architecture. This helped answer the issues of scalability and slow search speeds. With this system a new node connects to one or more super peers and once connected has access to all peers (and super peers) connected to that super peer, continuing out in a ripple patter until the time to live expires.
Gnutella's major advantage over and
difference from Napster era systems is there is no central
server. For a user to connect to a Gnutella network
they only need to know the address of one other machine on
the network. Once that connection is made then that
node will discover other nodes on the network until a few
connections are made. Ideally the connections should
be as far apart as possible (see diagram to left).
The less hops for an alternative route between nodes the
mode valuable the links between those nodes. If any
node in the network goes down, then the nodes that were
connected to that one just connect to a different
node. This redundant nature is very similar to the
basic redundant nature of the Internet
itself. 
Since no central server is required, there can be multiple unconnected Gnutella networks. This would allow for a corporation to setup a Gnutella network within their firewalls. Then they could regulate what the network is used for and not need to worry about outsiders accessing it.
As mentioned earlier, Freenet is unique among Peer-2-Peer systems. Similar to Gnutella in communication architecture, but the content of the network does not stay put. Each time a file is requested a copy is made on the nodes closer to the requesting node. This makes it more convenient the next time it is requested from the same location.
The folks at Nullsoft are at it again with the release of WASTE. Once again AOL removed their release shortly after it was made available. This time the source code was also released under GPL and a project was established on SourceForge.net. WASTE is a mesh-based workgroup tool that allows for RSA encrypted communication between small groups workgroups of users. The network is actually a partial mesh, with every possible connection made, limited by firewalls and routers. Communication is then routed over the network along the route of lowest latency, which allows communication between firewalled peers via a non firewalled peer.
Typical IRC use for chatting is a distributed centralized network - all communication passes through a set of distributed centralized servers. There is also a Client-to-Client Protocol available. When using CTCP (Client To Client Protocol) for chatting, or DCC (Direct Client Connect) for transfers, two clients can connect directly together, bypassing the central servers. Sometimes IRC networks will split, due to technical difficulties. If clients are connected with CTCP or DCC then they are immune to such splits. Using CTCP and DCC makes IRC into a brokered network, much like Napster, with the centralized server(s) introducing the peers to each other.
Firewalls and routers are two different devices technologically, but they have a similar "one way" effect on network traffic. Routers allow multiple internal machines to share a single IP address and connection out to the external network (Internet). Firewalls are very specific in what they allow to pass into and out of a network. Many times a router includes a firewall, or firewall functionality. In most configurations both firewalls and routers allow most or all traffic to flow out of a network and little or no traffic to flow in. For the purposes of the discussion here they will be referred to interchangeably. The deep technical details of configuring and differentiating the two are beyond the scope of this paper.
Firewalls provide many different
types of security. Typically, on a fairly relaxed
network, any internal client can connect out to any
external server on any port. Some networks may
restrict certain ports or even certain servers, but that
is more of an administrative choice then a network
technology issue. Through the use of NAT (Network
Address Translation) an external connection can be opened
to a specific internal machine, without this configuration
it is unusual for an external client to be able to connect
to an internal server. The downside of NAT is that
it requires configuration of the firewall for each port
and machine to receive a connection, and each port can
only be assigned to one internal machine. In this
section we will assume that the firewalls allow all
traffic to pass out of a network and no traffic to pass
into the network, which is a very common
configuration. For most client server applications
on the Internet this isn't a big deal, and is in fact a
great feature to secure a network.
For Peer-2-Peer where two peers are
trying to make a connection through a firewall this is a
big deal. If the peer behind the firewall is
connecting out to the other peer then this will work
fine. But if the peer outside the firewall is trying
to connect through to the peer inside the firewall it will
not be able to. If both peers are behind different
firewalls then neither will be able to connect directly to
each other.
There are a couple solutions to this dilemma beyond changing your network configuration. The solutions depend on what type of Peer-2-Peer network we are working within.
For a centralized network the individual peers do not connect directly to each other. They all connect out to the central server. As long as the central server is available as a server then there is no problem. So the central server could be behind the same firewall as all the clients (as seen in a corporate LAN setup). If the server were behind another firewall, then the firewall would need to be configured to allow external traffic to connect to it.
For a brokered network, the
configuration of the central server would need to be the
same as in a centralized network. When it is
necessary to connect to an individual peer (client) from
outside a firewall then the peer would need to route a
"push" request through the central server to the target
peer. When a peer receives a push request it then
connects out to the peer that is trying to connect to it,
reversing the direction of the connection. Most protocols
(e.g. Gnutalla, Napster, and FastTrack) use a method
similar to this to transverse firewalls.
If both peers are behind different firewalls then an intermediary is needed. This intermediary would tunnel the connection between the two peers. The intermediary could be a central server or a peer that both other peers can connect to. Many networks do not use the intermediary since that intermediary could end up consuming a large portion of the peer's bandwidth.
Very similar to brokered except instead of sending push requests through a central server, the request is sent through the peers that the query traveled through.
Freenet is an example of a distributed network that sends file transfers through intermediaries. In fact all transfers are through intermediaries. This provides the desired anonymity that is crucial to the design goals of Freenet. Since the requester never connects to the source the requester never knows where the file originally came from, protecting their identity. The downside is speed - it takes much longer to move it though all the nodes instead of transmitting it directly.
For networks that support routing (Freenet and Waste), firewalls are transversed much easier.
The programming examples will concentrated on mostly on distributed, but will also look at brokered.
The simplest Peer-2-Peer application will have a single client socket (connecting out) and a single server socket (accepts incoming connections). A server socket can accept multiple incoming connections while a client socket can only make a single out going connection, so the next more advanced would application would have multiple client sockets and a single server socket.
For this first example we will use one TIdTCPClient and TIdTCPServer socket. The simplest p2p chat would only allow a total of two chatters. Not very exciting. This example will allow a unlimited (theoretically) number of nodes. It will be completely decentralized with all chat's going through the intermediaries - new peer connections. Each message will be broadcast to the entire group and each node will re-broadcast the message if it has not previously broadcast it. There is no Time To Live so each message will travel the entire network regardless of size.
Take a form named p2pChatForm, add a
TIdTCPClient named ClientSocket and
TIdTCPServer socket named ServerSocket. Also
include an TIdAntiFreeze to keep the UI from
freezing. Arrange 3 TEdits, a
TCheckBox, 2 TButtons, 2 TMemos, a
TTimer and a a TStatusBar with two panels so
it looks something like you see to the right. Name
the TEdits ListenEdit, HostEdit and NickEdit.
Name the TCheckBox ListenCheck. Name the
TMemos LogMemo (on top) and EditMemo (on bottom).
Name your TButtons SendBtn and OutBtn. Rename
the TTimer to ClientPoll and set the interval to
100 milliseconds. Finally name the TStatusBar
StatusBar. Use TSplitters and TPanels to get
the layout you want.
First lets add a routine to handle received messages:
|
This routine takes a single string as a parameter. If that string is empty or already displayed in the LogMemo then it is ignored. Otherwise the string is added to the LogMemo and then Broadcast on all connections. It is important to check for empty strings as we will see in the client polling. Instead of separately tracking received messages we just check to see if it is displayed in the log. All messages will have the sender's nickname as well as the time the message is sent to reduce the odds of a duplicate. This isn't a perfect implementation, but will suffice for our demo.
Now for the Broadcast routine:
|
This routine also takes a single string parameter. If our ClientSocket has a connection going out then we send that string out on that connection. Next we obtain a lock on the Peer Thread list for the ServerSocket by calling the LockList method, which returns the list of threads. We iterate through the threads and write the string to each one. Finally we unlock the list. Since we don't actually have a new instance of the list we DO NOT free it.
Handling incoming messages on a messages on a TIdTCPServer socket is as simple as assigning an OnExecute event handler that calls our routine to handle received messages.
|
But what about the TIdTCPClient sockets? Client sockets are typically outgoing sockets and do not have an event for incoming data. This is what our ClientPoll TTimer is accomplishes on its OnTimer event.
|
By specifying a TimeOut when calling ReadLn on the TIdTCPClient we can just read data if it is available. If no data is available then Readln returns an empty string. This is why it is important that we ignored empty strings in our Received routine.
The remainder of the application is fairly straightforward and I will leave it to you to explore the accompanying source code. A couple notes on the program. Because we use the LogMemo to see if we previously broadcast a message it is important that we have WordWrap turned off because that will change the line if it is longer then one line. The solution to this would be to keep a separate log of processed messages (with better identifiers!) Also, this application can connect to itself if you want, or you can run multiple instances of it on the same machine.
http://www.gnucleus.com/GnucDNA/home.html
Gnucleus DNA is a COM component that provides all the behind the scenes support to creating a Gnutella client. The first example just covers some of the basics for using this component.
http://gnucdnadelphi.sourceforge.net/
Gnuminous is a full fledged Gnutella client written in Delphi with Gnucleus DNA. We take a look at some of the code and what it would take to build our own Gnucleus DNA client.

GPU is an open source (GPL) Gnutella client for sharing files and CPU-resources with a goal of building a supercomputer. Pure Delphi implementation.

MsgConnect is a library available in pure Delphi, or a number of other languages (C++, Java, C#) for building Peer-to-Peer applications on the Windows, Linux, Java and Palm platforms. An ActiveX and DLL are also provided for other non-native supported languages. MsgConnect is a commercial library provided by EldoS, and can be licensed under GPL (thus making your application GPL) or a standard commercial license.