Title: BCodec
Description: Encoding and Decoding the torrent datafile with BCodec.
Author: Paulo Santos
eMail: pjondevelopment@gmail.com
Environment: VB.NET 2005 (Visual Studio 2005)
Keywords: BEncode, BDecode, torrent

BCodec

DISCLAIMER

The Software is provided "AS IS", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. in no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

Introduction

Last year, I finally upgraded my broadband connection: from a lazy DSL to a whooping T1. Only after that I came in contact with BitTorrent. There was some commotion in the Yahoo Group that supported its development and I felt an urge to create something.

My natural development environment nowadays is the .NET platform, so I decided to write a BitTorrent client in VB.NET.

I gathered as much information about the protocol and file format I could and started coding something.

A few months after I started creating my client, the creator of BitTorrent decided to shutdown the support group (as you can read here ) and somehow it affected my desire to keep with my new creation and I aborted the project.

I did create some working classes to this project. I am making available here one of them: the BCodec. However, once I am playing around with Visual Studio 2005, I converted the entire class to work in this new environment.

BEncoding

BEncoding is a way of encoding information, like XML. It supports four types of data:

  • String
  • Integer
  • List
  • Dictionary

BEncoding Strings

The only strings that BEncoding supports are ASCII strings. Unicode characters are not supported by this format. However many developers work around this limitation by utilizing UTF-8 encoded string prior to BEncoding  them.

The best way of explaining how a string is BEncoded is with an example:

15:BEncoded_String

We have the length of the string followed by a colon and then the string itself. The above string is 15 bytes long, for instance.

Each character is taken at its binary value, for instance if following the colon we had a byte &H0 it would represent Chr(0) instead and empty string.

BEncoding Integers

Integers are the only numeric data type that are supported by BEncoding. They are encoded with a leading "i" and a trailing "e", in their human-readable decimal representation.

For instance:

i2010e

Represents the integer 2010.

BEncoding Lists

List is an ordered sequence of other data types. It can contain any other data type supported by BEncoding, including other lists. Lists are encoded with a leading "l" and a trailing "e".

Each element in the list is also BEncoded.

Example:

l13:I am a String18:Next is an Integeri789ee

The above example is decoded as:

["I am a String", "Next is an Integer", 789]

BEncoding Dictionaries

A dictionary is a special types of list. Each element of a list is indexed by its position, in a dictionary the element is indexed by its key. Dictionaries are encoded with a leading "d" and a trailing "e".

The first item of a BEncoded dictionary is the first key followed by the value, followed by the second key, followed by the second value, and so on.

Both the key and the value are BEncoded.

Example:

d6:square6:yellow5:valuei1025e7:requestl6:banana6:tomatoee

The above example is decoded as:

{
  "square"  -> "yellow",
  "value"   -> 1025
  "request" -> ["banana", "tomato"]
}

The value of each dictionary entry can be any supported data type, including other dictionary.

The original documentation

The whole original documentation on the subject is:
 

BEncoding is done as follows:
  • Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'.
     
  • Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e , are invalid, other than i0e , which of course corresponds to 0.
     
  • Lists are encoded as an 'l' followed by their elements (also BEncoded) followed by an 'e'. For example l4:spam4:eggse corresponds to ['spam', 'eggs'].
     
  • Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow': 'moo', 'spam': 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam': ['a', 'b']}. Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).

from the official documentation site http://www.bittorrent.com/protocol.html (broken link)

Update: For some strange reason the original file was lost, and I needed to replace with a backup file I have. I'm sorry for any inconvenience.