Infinite Dimension: 2009

Friday, February 27, 2009

The Be-All End-All of the Internet

One could say that's the lesser-known name for the World Wide Web. To many people even, the Web and the Internet are synonymous, which makes sense since they arrived in the public consciousness simultaneously. For anyone using computers and computer networks between 1992 and 1994, they're likely to remember the transition from modems and bulletin board systems (BBS) to the Internet. Certainly, the Internet had been around for quite a while, but the use of it was frankly not that appealing until one was exposed to their first World Wide Web client.

For me, sure, I was pretty familiar with Telnet and various UNIX email clients that worked completely in text mode. (Truly hardcore geeks will attest that is the real Internet, not all this Web noise. I am certainly not that hardcore of a geek.) There was also the Gopher service, which could fetch purely textual documents, which had -- interestingly enough -- the ability to hyper-link between pages. Yeah, you could say it was the Web without graphics and glitz, but the rendering power of an eighty-by-twenty-five character terminal piques the interest of the previously mentioned hardcore geeks and science researchers yet few others.

At my fall co-op job in '93, another student was showing off some new Windows 3.1 program called NCSA Mosaic. Using Mosaic, he pulled up some new documents he had just created for himself, and he called it his "personal Web Site" (fanfare). This page had some exciting tiled background, images of himself, spinning graphics, an under-construction graphic, and of course, blinking text that declared how he had made this page himself. Of course, there were the font size and color changes, heading changes, horizontal rules, and makes-baby-Jesus-sad color schemes. Little known to me at the time, he had made use of every single HTML tag that was available.

Okay, looking back, all good HTML designers know that he was committing grievous offenses to good Web design, but at the time, there was something very compelling and exciting about this Web medium. Little did I know that this "Web browser" and "World Wide Web" thing was really going to change the world. I'm sure a lot of people have similar experiences or revelations with the web, but let's come back to modern day, and talk about what makes the Web great and whether it's truly the be-all that some aspire to label it.

My first exposure to the Web demonstrates how easy it is, first, to get on the web and secondly, to have a web presence. Once you're in front of an Internet-connected computer (or device) with a browser open, all you have to do is know a Web address, type it in the address bar, press the Enter key, and wait for it to display. From a given page, there are typically links to other pages either local to the site and often external to the site. In the latter case, portal sites with numerous external links potentially exposes its viewers to vast seas of content. With enough time and the right portal sites, people are able to traverse this web of information, which obviously has become an interesting and compelling medium for users.

Ease-of-access is only part of the story for the Web's success; without compelling waves of content to surf, users would still be looking at a sea of flashing, colored text, animated GIF images, and under-construction signs. The success of the Web must be partly attributed to how easy it is to serve information. Setting up a simple, small-scale Web server requires little more than an always-on computer, Web server software (typically free, open-source), an Internet connection, and some network configuration. A web system can be scaled up from a small site that handle a tiny trickle of traffic up to systems that handle millions of browser requests per second, such as Google.com or any number of well-traveled Web destinations. While simplicity of setting up the delivery system (the server), what you deliver is equally important.

As demonstrated by my fellow student's first Web page, creating content is quite easy, requiring just a little bit of curiosity, experimentation, and a text editor. As he also demonstrated, I wouldn't even say that taste and aesthetics are required (though highly beneficial). As the Web has risen in use and popularity, additional content creation, delivery, and management tools have been built, providing things such as WYSIWYG HTML editors, scriptable/programmable/dynamic Web pages (e.g. JavaScript, Flash, PHP, DHTML), and style management (e.g. CSS). These tools are behind the modern Web, creating a more compelling place to surf.

While barriers-to-entry are low for content viewers and content creators alike, the Web is a limited medium. Take a few seconds and think about what a Web page actually is. Or you could see what Wikipedia has to say about the Web. Anyway, some of the key concepts you may have come up with are: documents, pages, images, videos, text, hyperlinks, and if you threw out Google, Facebook, or YouTube, that's fine too.

These key words you came up with (besides Google, Facebook and YouTube, smarty-pants) all take root from the Web's origins in text delivery, lending itself well to viewing documents and traditional-textual media like books, articles, newspapers, and magazines. This can been seen in the traditional media that have moved to web (New York Times at nytimes.com, for example), things like blogs (what you're reading now), microblogs (twitter.com), and RSS technology used to publish and syndicate news and news-like media via the Internet. The Web's ability to display two-dimensional graphical media, such as images and video, have adapted it well for the same media mentioned before. This should be obvious from the success of sites like Flickr and YouTube.

Now think about your world for a bit: who and what you interact with daily, how you interact with it. You wake up, address your appearance (i.e. wash, brush, dress), eat, go to work, interact with our cars, devices, machines, coworkers, and return home, interact with family, friends, home devices and machines. This discussion is about to seem even more esoteric, but bear with me a bit. Human languages are devised to help us communicate the richness of this world, the real world. We have words for all the various objects that you could encounter in your day. Words for things with which we interact (banana, friend, machine, home, book, shirt), words that describe different ways we may act on these things (eat, talk to, pick up, go, read, wear, give) or to describe an object's state of existence (sit, stand, exist, move, fall, tears).

In this world I asked you to think about, you may read a book, a magazine, or newspaper, or you may surf the Web, too. We could decompose Web surfing into activities like reading email, finding or researching a topic on Google and Wikipedia, looking at photos on Flickr, watching a YouTube video, or updating Facebook and catching-up with friends. Now, let's compare the richness of the world's verbs with the Web's verbs. World: eat, talk, pick up, go, read, wear, give, carry, put in, take out, sit, drive, fly, the list goes on and I encourage you to spend a moment to expand on this list. On the Web, we can: read, find, research, look, watch, update, catch up.

While the Web's list is not complete, you should find that the list is a marginal subset of the verbs we use in our world-at-large. My point here is that the web is designed originally support or emulate a subset of things we do in real-life. If we want to talk to friends via the web, new protocols and additions are made to extend the Web to do these things. Web chat and Web conferencing are examples of how engineers have tried to extend the Web to be-all, that is, be a solution to all things we can think of.

From a technical aspect, a major weakness of the Web is the HTTP protocol. HTTP implements a call-and-response paradigm. Essentially, when a user enters a Web address, the browser issues a request -- or call -- to the server. When the call is received, the server responds, sending the requested page to the user's browser. Alternatively, Web browsers can submit data (from login or billing information forms, for example) to a server. A page received by a Web browser may be embedded with references to other content (images, videos, sound clips, Flash applications, style sheets, and even another HTML page) that the Web browser will request in order to render the complete Web page.

This is all well-suited for user-initiated interactions, but this paradigm is also limited to user-initiated interactions, and is not suitable for systems that where other objects wish to initiate interaction with the user or wish to maintain a constant dialogue between browser and server. A simple real-world example is the telephone: if you wish to call someone, you dial their number, and the recipient decides whether to respond by answering the call. This is the same paradigm of the HTTP protocol. A browser attempts to open a call with the server, and the server responds if it accepts HTTP communications. A major difference is that telephones permit calls in the reverse direction, someone who wishes to call you is able to dial your number, giving you the opportunity to respond. On the Web, HTTP allows no way for the server to proactively contact your browser to send you a page that you didn't request. Plenty of crafty engineers have been put to work to build workarounds to enable such behavior. While there are viable solutions to some of these problemts, the fact remains that the Web is just not suited nor was it designed to do this.

A further limitation of HTTP is that, in general, connections are only maintained until the requested page or file is complete, after which the line is disconnected. In the telephone scenario, a received call establishes a connection that remains until one side hangs up or is disconnected. Imagine after making a call, you ask one question of the other party, and once they deliver their response, they politely hang up the phone. To ask another question, you have to call again, and for each question, you have to make an individual call. This is essentially what occurs with most Web servers.

This behavior actually makes a lot of sense in the context of typical Web browsing. A user will request a Web page via HTTP from a Web Server. Once the page has been delivered, the user will view the page for perhaps a number of seconds to a minute or even longer. Until the user clicks on a link, no further network communication is necessary, and the link may be with an entirely different server. In HTTP's intended mode of operation, maintaining the browser-server link is entirely unnecessary and ties up resources unnecessarily, which is especially bad if a server must simultaneously handle the requests for large numbers of browsers.

This makes HTTP, and thus the Web, ill-suited to applications where the client and server communicate in a dynamic dialogue. A common scenario is when the server maintains the state of a dynamically changing system, and the client (browser) displays a view of the server's state. Email clients periodically poll a server to see if new email has arrived. Another example would be the watching a live sporting event from the client. As points are scored and participants move around the field, the client would see the live score and could watch the action take place.

The ideal solution is for the server to pass state changes to client once it is aware of a change in state. When the server receives a notice that a team has scored, it immediately transmits this information to all it's clients that are watching this event. This solution is highly efficient, since communication transmissions only occur when the server state changes.

With a client-initiated call-and-response system, the common solution is that the client must periodically poll the server for information. To get adequate real-time information, the client must request information once-per-second, or perhaps faster. With multiple clients, this leads to heavy workloads for the server to service each client request every second. Worse, if the state has not changed, the client and server communicate that no change has occurred. Such communication provides no value to the client or server; the bandwidth and server cycles are wasted. Clients can tradeoff real-time accuracy of server state by decreasing the frequency of polls. At the far end of this tradeoff spectrum, a client is only sure of the accuracy of its state after a poll. For a system state in high flux, this accuracy could be gone a mere instant after the poll. All of these tradeoffs and inefficiencies can be avoided by staying away from HTTP in these scenarios.

I'm a high-use Web user, and so I appreciate the ability to receive digital media from a wide-range of content providers. It's easy for me to access that media; I just launch a browser while connected to the Internet, enter the Web address, and a second or two later, it's there. Due to the low barriers of entry and it's scalability, both home users and corporations can easily build a Web serving infrastructure. The simplicity of HTML allows content creators to get started easily making their first web pages, while advanced tools such as JavaScript, CSS, PHP, DHTML allow more ambitious creators to create and manage compelling content.

Despite the advancement of tools, the Web and HTTP was designed for a purpose and to operate in a specific way. With the best of intentions, engineers and corporations have attempted to reappropriate the Web to fit their every need, but there are limits to what can be done. The Web is founded on a two-dimensional, text foundation with a limited vocabulary of interaction. Also, the Web's underlying protocol, HTTP, is designed primarily for call-and-response, short-lived communications, instead of enduring dialogues. Engineers who wish to expand the Web's purpose or who need to make it work in ways beyond those originally intended face serious challenges, as they're essentially trying to make a square peg fit in a round hole.

There is much we can learn from the success of the Web and its design, however engineers are asked to shoehorn the Web to fit our every need. The Web is not and should not be asked to be the be-all end-all of the Internet.

Friday, February 13, 2009

C Exhausted, Moving On to D

So, I've been pining away for a programming language that emancipates us from the tyranny of C, has the modern features of Java and C#, while also keeping the familiarity and native compilation of C++. I want my three-tiered cake, and you can bet I'm going to eat it too.

I wanted to call it C-prime, E++, and eventually settled on C-prime-prime. Gone would be the relics of C: farewell to #define and your preprocessor ilk, bug-prone-and-confusing array handling would be slain, and never again would we be forced to write redundant declarations within headers files. We would honor the new traditions and better paradigms with operator overloading, exception handling, run-time type identification, and template programming.

At last, we reveal that D can be a suitable champion for this cause. With so many people using C++, it makes sense that some intelligent minority would seek to build a better C++, building on its strengths, while realizing the benefits and advanced made in newer languages and even of its own accord. D is not my holy-grail of languages, but after researching it, my most important desires for an improvement over C++ seem fulfilled.

First the D language discards the antiquated relics left behind by C:

D drops #define and the C preprocessor - a giant source of programming pitfalls, when better alternatives exist.
D does not require forward declarations - a detriment to efficient code writing; now we can lay code out naturally without writing additional and redundant lines of code, which simply make maintenance more difficult.
D does not have header files, which happen to be an even bigger impediment to efficiency; D requires no additional files with declarations of redundant code fragments and end up drawing out compilation time

In keeping with C++, D continues many of the C++ language's best features:

D supports object- and interface-oriented programming paradigms
D supports operator overloading (in a simpler, more efficient way than C++!)
D has templates as a meta-programming paradigm
D continues exception handling and run-time type information

Further, it brings the best features of Java and C# without the interpreted code that must be run on a virtual machine:

D compiles to native machine code, so once you've compiled, your computer is ready to chomp those bits
D adds a garbage collector; this is a feature I'm on the fence about, but it does have a significant boon to writing-efficiency and bug-catching. GC has its own realm of issues, but it should lessen the number of developmental pitfalls
D removes the need for pointers; pointers are still available, but passing objects is implicitly by reference.
D improves type checking, making things such as typedef create stronger types. Overall, D is a more strongly typed language than C++.
D supports synchronization as a language feature, instead of as a library feature
D increases security of arrays by tracking dimensions, and attempts to make array declaration more consistent and readable.
D differentiates between invariant objects -- objects whose data is absolutely read-only -- from those object which are constant -- objects whose data is obligated to only be read, regardless of whether the data underneath is modifiable by other means. Further reference semantics are far more sensical, and D offers more control of const-ness and invariance when multiple levels of indirection are required.

Of course, D is not without its drawbacks:

D chose to permit function-data programming paradigm, so data-driven, iterative programming remain viable. This is something that remains troublesome for very large projects, but is viable and valuable for smaller programs, so I can live with this.
D permits global variables. Globals are probably the number one thing a programmer can use to make a module more difficult to understand and debug, as it thwarts data abstraction and data-hiding motivations. I know banning it seems awfully authoritarian and extreme, but I can temper myself while adding a warning to programmers to be extremely wary of global variables.

On to the D language's more core issues:

D is young, meaning there are less tools and development environments that are "D-ready." You won't find Microsoft Visual D, but there are Eclipse open source plug-ins.
The D language's standard library is well, non-standard at the moment. There are two variants, Phobos and Tango, which are currently being integrated into D 2.0.
By bringing in Garbage Collection (mentioned earlier), there exist situations where you may need to maintain a weak reference to an object, when a full reference might result in a circular reference, meaning a reference that indirectly ends up referring back to the referrer making a full circle. In D, there is no built-in support for these weak references.
Finally, D is not backwards compatible with C++, in the way that we could have our own improved C++, while keeping everything from before. However, D does support linking with C libraries, while it does not fully support linkage with DLLs that use C++ link tables.

Overall, D seems to be designed with people like me in mind. It's far more efficient, it has numerous modern features intended for large-scale projects, and is practical, yet efficient in its intent, design, and implementation. For more information about the D programming language, check out Digital Mars:
http://digitalmars.com/d/2.0/index.html

and check out the Wikipedia page:

http://en.wikipedia.org/wiki/D_programming_language

Wednesday, February 11, 2009

A more Effective C++

As stated before, C++ has a lot going for it -- things I wouldn't want to reinvent. Compiler support, well-understood syntax, native run-time execution, the list goes on. There are parts I want to get rid of. Support of C-based programming idioms, header files, global variables, these things have got to go. Well, I take that back. We need the ability to support legacy code, but we don't have to like it.

Let's pretend we have a new evolution of C++ that let's us be more effective (intentional reference to a Scott Meyers book here). Let's call it C++ - C, or C' (hmm C Prime seems to be taken), or how about E++ (that's kinda taken too), too bad for someone. I'll go with C'' (C-Prime-Prime), for now, but I like C Prime better.

C'' supports:

Object-oriented and other class-based development techniques
Inclu[de]-sion of other declarations classes by name instead of header file
Templates and all other C++ goodies we know and love

C'' drops support for:

#define (fewer pitfalls for programmers to fall into)
Global variables (reduce coupling in classes and modules)
Loose procedures and functions (only supports class methods, increases potential for achieving cohesion)

C'' is still backwards compatible with C++ and C.

Screeeeetch! Contradiction. The past two "feature" sets are mutually exclusive. Aren't they? Okay, obviously, I have a trick up my sleeve or some nasty kludge to hit you with.

Not really. If our goal is to utilize and leverage the strength of the C++ language and tools, then let's do so. If we devise a C-prime-prime "compiler" to take as input our C-prime-prime file, and then output a C-plus-plus file, we simultaneously realize a number of significant benefits:

Permit a more efficient, header-less coding style
Allow us to restrict ourselves to the desired OO- programming idioms, while ridding ourselves of trouble-laden C idioms
We can re-use and leverage all existing C++ tools, processes, methodologies, build environment to build our binary executables
We support legacy C/C++ code side-by-side with our C'' code

So, our C'' "compiler" is closer to a preprocessor than a compiler. It takes a text input file, and generates two output files -- intermediary C++ source and headers file, which in turn are compiled by standard C++ toolsets and toolchains.

C'mon, this is so easy. Has anyone done this before?

Back to the Program... -ming.

Programming. I mean, time to get back to programming.

I woke up and showered before the time that my girlfriend, Chris, had to go to work. No, you don't understand! I don't have to go to work, and I've chosen not to have a job for nearly a year now. I normally can sleep in...and do.

Well, I am getting over a cold, so my sleep schedule is weird, but that's doesn't fully explain why. My brain is in overdrive (wail like a banshee on fake-plastic guitar controller). Hopefully, it'll all make sense eventually, and this initiative won't fall flat on its face.

My brain is overdrive. The software developing and engineering part. Perhaps I've finally recovered from working at Nameless Software Corporation for nearly eight years. Perhaps it's just a digital-spiritual calling. Whatever it is, these are some very interesting ideas, which will lead me to talk about the namesake of this blog. But that comes later. In due time.

For now, programming-talk. C++ is a great object-oriented language. I take that back. In the hands of strong programmers, it can be a great language. C++, laden with its need to be backwards-compatible with C, really thwarts its own ability to be a great object-oriented language on its own.

Things I like about C++:

Supports object- and interface-oriented programming paradigms (suited for large scale projects)
Widely-used (supported, has industry-standards)
Well-supported over nearly every operating system and chip architecture available (potential for programs to be written for a wide set of environments)
Compiles to native code (requires no intermediary run-time environment, better application performance)

But it also has features that I find dissatisfying:

Supports the procedural programming paradigm (doesn't scale well for complex projects)
Supports macros (a common source of bugs or compiler frustration, thwarts the type checker)
Backwards compatibility with C (an extension of the previous two points; this is not a bad thing in of itself, but there are now two ways to do many things: the C-way and the C++-way, which can cause confusion, and is a source of pitfalls when both ways are used)
Global variables (decreases code comprehension, flexibility, and abstraction, increases complexity and coupling, the reliance of one module on another)
Requires creation and maintenance of header files (decreased development efficiency)

There might be more issues with C++, but my point is obviously that I want to do something about this. There must be an answer.

Infinite Dimension