Writing > The Zero Percent Solution

The Zero Percent Solution

Published: December 29, 2010

Significant portions of the difficulty experienced in writing revolves around determining the taxonomy of the finished document. Recently, assistance has arrived on this confounding vexation via an observation that pronoun laden text is more suitable for the blog format and thereby implying pronoun parsimonious text is more applicable to the article format.

Initial ruminations surrounding this concept quite naturally led to the expedient acquisition of data collection as the preeminent concern. In their natural habitat, pronouns are easily discernible from other parts of speech. As such, an appropriately crafted script should be able to sift through voluminous texts and report accurate data pertaining to the particular inhabitants therein who maintain an affirmative position vis-a-vis nouns. However, a multitude of scripting languages exists; indicating that selecting the appropriate language would facilitate the developmental effort while an inappropriate decision could severely retard the advancement of this momentous undertaking.

While assiduously engaged in said exploration, quite serendipitously, a virtual location revealed functionality easily adaptable to the task at hand. This information is accessible to any like-minded investigator so inclined to point a browser to the following locale


This deft tool was applied liberally to musings at this selfsame website and immediately generated tangible results. Initial efforts focused on featured articles for this date, 12/29/2010. A clear and present trend revealed itself. One senior editor emerged admirably as a consistent 4% pronoun producer. The other featured author represented at a respectable 6% rate.

Chronologically, the last 10 articles posted ranged from 4% to 8% pronoun density. The majority of writing posted under a unique byline presented a fairly tight standard deviation ( i.e. a range of 6% to 8%). One could almost assert that individual authors displayed a distinctive personal artistic signature via this metric. Looking back at prior products generated by this space, the standard deviation was considerably more expansive, defining a range from 4% to 8%. The most egregious use of pronouns occurred in a possibly misclassified literary review, that document centering upon the manuscripts of one Mary Roach. The remaining documents were within the tolerance limits for typical documents classified as articles.

Having traversed quite possibly a statistically significant sample of the articles, it was time to place the mathematical magnifying glass above a sampling of blog posts and potentially uncover the remaining piece of the puzzlement. Once again, accolades are in order for the editorial staff. An early blog on Google Chrome OS generated a practically parched pronoun count of 1%. Further review of blog text proved equaling perplexing relative to the pronoun proportions presented. The blogs sampled returned pronoun densities ranging from 1% to 7%. Considering that articles concentrated themselves in the 4% to 8% range, a bright line solution such as “4-6% is an article, 6-8% is a blog” not only did not emerge, but seemed to fade much like gorillas into the mist.

Bemused and well nigh befuddled by this untimely turn of events, reconsideration of the disjointedness of the data relative to the observation was imperative. Such reconsideration led to the possibility, nay probability, actually near certainty that the low pronoun density, while arguably a necessary condition to codify a document as an article, was most assuredly not the necessary and sufficient conditions required for formal validation of the thesis.

An applied mathematician with an obsessive compulsive disorder, a questionable premise and an apparently functional diagnostic tool is certain to turn towards the light in such dark times. Indeed, luminous rotation transpired. Returning to the tool for additional threads to pull, a triumvirate of beacons were found burning brightly. The Complexity factor (Lexical Density), Readability (Gunning-Fog Index) and Readability (Alternative) beta. For analytical purposes, these are annotated as CF, GF and RB. A hypothetical formula for utilizing this data to validate an article is presented as

LPI=(CF*GF)/RB (with LPI being defined as the lexicographical prophylaxis index)

Running those numbers on this document (to this point) shows

CF=83.1% (revisions could increase the complexity by less than 17%)

GF=12.5 (most articles score in the 4-6 range)

RB=15.7 (20 is hard, 100 is easy, this is dense)

Inserting subject data in the formula resulted in an LPI of 66.16. Running the same numbers on a random article and turning the figurative crank resulted in an LPI of 5.88.

While this space, with or without culpability or malfeasance, may have previously misclassified a blog as an article, it is self evident that in the present tense, this assembly of words is most certainly

A definite article.

Any Comments?


» Soju a go go

Girl On Fire

Published: May 5, 2013

A writer's scattered thoughts...


Published: February 26, 2012

Review of the book Vertical by Rex Pickett.