Colors of Twitter
Basically by combining Twitter feed data (100 million tweets and counting) with a language detector (together with some geometric post-processing, and lots of hours of coding), I was able to offer a quite detailed look at the language geography of the Twitter universe.
By generating GeoJSON I can display the data in an interactive map:
And PNG output can also be useful to easily share highlights about specific regions:
Africa: Considering how Twitter users often choose the language with a biggest audience, I am happy about the language diversity that shows up in Africa, especially in the South, the East, and the Gulf of Guinea.
Catalan: The use of Catalan seems much stronger in Catalonia than in other Catalan-speaking territories like the Balearic Islands or the Valencian Country. The map seems to confirm a specially advanced state of linguistic assimiliation in the latter case.
Europe: Most of the European language areas strongly follow state boundaries, with the notable exception of minoritarian languages like Basque, Catalan, Galician, Norwegian Nynorsk, Welsh... Belgium —which is roughly, but not exactly, split in two halves— is also an interesting case.
India including English (left) and without including English (right). A Twitter user pointed out that Hindi is the preferred lingua franca in Northern India, and English in the South.
To know more about this project, see: https://jperals.github.com/colors-of-twitter