Colors of Twitter

2019

Basically by combining Twitter feed data (100 million tweets and counting) with a language detector (together with some geometric post-processing, and lots of hours of coding), I was able to offer a quite detailed look at the language geography of the Twitter universe.

By generating GeoJSON I can display the data in an interactive map:

And PNG output can also be useful to easily share highlights about specific regions:

Linguistic map of Africa

Africa: Considering how Twitter users often choose the language with a biggest audience, I am happy about the language diversity that shows up in Africa, especially in the South, the East, and the Gulf of Guinea.

Map of the Catalan language

Catalan: The use of Catalan seems much stronger in Catalonia than in other Catalan-speaking territories like the Balearic Islands or the Valencian Country. The map seems to confirm a specially advanced state of linguistic assimiliation in the latter case.

Linguistic map of Europe

Europe: Most of the European language areas strongly follow state boundaries, with the notable exception of minoritarian languages like Basque, Catalan, Galician, Norwegian Nynorsk, Welsh... Belgium —which is roughly, but not exactly, split in two halves— is also an interesting case.

Languages of India including English
Languages of India excluding English

India including English (left) and without including English (right). A Twitter user pointed out that Hindi is the preferred lingua franca in Northern India, and English in the South.

To know more about this project, see: https://jperals.github.com/colors-of-twitter