The Fallacy of Gender Based Definitions of Language Use … Or … Your Number Crunching is Tosh

While digging through some archived material online I found a reference to something called “The Gender Genie“. This online gender evaluation tool is at least a couple of years old now, and distinctly reminds me of the “gender tests” one used to be able to find all over the place online ten to fifteen years ago. I never cared for them then, and I’m not particularly fond of this thing, but I was curious as to how it worked, so I tried it.

One enters a body of text (the system claims it works better with text more than 500 words in length) and the system attempts to determne the gender of the author by scanning and calculating keyword use. Unfortunately, the creators of this test have chosen to assign keywords male and female values, which is always a mistake since English terminology is gender neutral.

However, the idea behind this test isn’t that certain terms are by themeslves male or female as would be the case in a language like, say, French, where  the gender weight a term carries does in fact alter terminology and the structure of phrasing, and the gender of the speakers themselves determines which terms can be used in sentence construction. No, instead the idea revolves around the notion that the use of specific terminology can be linked to masculine and feminine usage.

Or, more specifically, that a set body of terms can be chosen and ascribed with male and female correlative associations. The list goes something like this:


Feminine Keywords

with, if, not, where, be, when, your, her, we, should, she, and, me, myself, hers, was


Masculine Keywords

around, what, more, are, as, who, below, is, these, the, a, at, it, many, said, above, to


Kindly note the imbalance in the lack of terms such as “him” or “he” appearing in the Masculine list when their equivalent counterparts occur in the opposite list. Also note the (in practice) specious assumption made by the developers of this list that the use of both prepositions and pronouns are solely masculine activities. The assumption that female writers are uniformly passive while male writers are uniformly active is equally absurd.

The above list declares its bias in boldly stereotypical (and rather mysoginistic) tones by asserting that women write softer, passive, less engaging or involved verse, while men create active, removed passages. There is also an unstated assertion – made by the terms chosen here – that women write about individuals and personal or reflective verse while men write about the world around them and are less concerned with the landscape of the internal.

Also interesting to note is the belief posited that women are chiefly concerned with the “where” of things, while men care chiefly for the “what” and the “who”. Apparently nobody cares “why” or “how” according to this model.

The list of terms above is also weighted with a point value system, as follows:


Feminine Keywords

with (52), if (47), not (27), where (18), be (17), when (17), your (17), her (9), we (8), should (7), she (6), and (4), me (4), myself (4), hers (3), was (1)


Masculine Keywords

around (42), what (35), more (34), are (28), as (23), who (19), below (8), is (8), these (8), the (7), a (6), at (6), it (6), many (6), said (5), above (4), to (2)


Again there is disparity in the weighting system. The ranges are not remotely equivalent, even where there are properly comparable terms in evidence on the two lists.

The numerical values seem to exist solely to facilitate the necessary step of determining the author’s gender through an arcane set of calculations intended to reinforce the list creator’s decision to base their formula on percentile usage of terms. These numbers are presented to us post calculation as a way of inferring the list creator’s bona fides. You scored X. You are clearly Y.

So, what happened when I put in the full body of text from several of my works? The algorithm returned a correct answer every time, guessing without fail that I was a male writer. The answer, much as I decry the assumptions made by the argument, has to do with the following statement made by the developers of the algorithm that produces the “Gender Genie” ‘s results in their paper “Gender, Genre, and Writing Style in Formal Written Texts“:



This paper explores differences between male and female writing in a large subset of the British National Corpus covering a range of genres. Several classes of simple lexical and syntactic features that differ substantially according to author gender are identified, both in fiction and in non-fiction documents. In particular, we find significant differences between male- and female-authored documents in the use of pronouns and certain types of noun modifiers: although the total number of nominals used by male and female authors is virtually identical, females use many more pronouns and males use many more noun specifiers. More generally , it is found that even in formal writing, female writing exhibits greater usage of features identified by previous researchers as “involved” while male writing exhibits greater usage of features which have been identified as “informational”. Finally, a strong correlation between the characteristics of male (female) writing and those of nonfiction (fiction) is demonstrated.


While I for one am inclined to find the above somewhat too simplistic in its approach to gender usage, it does raise, for me particularly, an interesting point. If the system requires a minimum of 500 words to produce an accurate sampling, what happens when you feed in less than the required input? Predictably, the numbers go crazy. Witness what happened when I put in only the first paragraph of three very different pieces of mine:



Female Score: 135
Male Score: 125
The Gender Genie thinks the author of this passage is: female!


Female Score: 99
Male Score: 77
The Gender Genie thinks the author of this passage is: female!


(NOTE: The genie works best on texts of more than 500 words.)

Female Score: 31
Male Score: 152
The Gender Genie thinks the author of this passage is: male!


Utilizing a much smaller sampling of material, the algorithm determines my style of writing to be 66% feminine. Or, taken in context, I start my pieces in a feminine mould and continue on, or conclude, in a masculine vein.

I’ve never been much of one for number crunching when it comes to language. I am aware that it’s a popular pastime, but the real problem I have with this kind of “gender test” is that, numerically right or wrong, it reinforces already oppressive gender stereotypes not just in literature, but about the people who write it.

Making intellectual or societal determinations based solely on gender – especially the gender that underlies argumentative tract – is a dangerous thing indeed. Not only does this kind of thinking demand that we adhere to models rooted in and evidentiary of past treatise or works, it also, in the same breath, denies us the right to move forward by declaring that any future deviation from the normative model is an aberration (or an outlier if you’re a zeitgeist groupie).

Perhaps, for me, this attitude is also especially irksome since I have been trying, through my writing and other methods to address issues of weighted or imbalanced representation of race and gender in fiction. Spec Fic in its many incarnations has (classically) been riddled with a profusion of white male protagonists. Powerful female protagonists and main characters whose skin tone was not synonymous with the calla lily have always fascinated me. Do I uniformly avoid utilizing white male protagonists in my own work? Nope. Don’t always swing it.

But let me tell you this. Of the three pieces I have sold thus far two have featured female protagonists of undisclosed race, and the last has a black male protagonist.

I may not be rocking the world off its axis, but you damn well better believe I’m trying to shift the landscape one mountain at a time.

This entry was posted in Ramble, Uncategorized and tagged , , , , . Bookmark the permalink.

