Centrality and Centralisation

Centrality and Centralisation

A Social Network Analysis of the Early Soviet Film Industry, 1918-1953

Joan Neuberger
The Soviet film industry, like any other institution, was made up of networks of people who knew each other, or who knew people who knew each other. In this article, Joan Neuberger examines some of those relationships using digital social network analysis. Applying digital network analysis to the connections between the directors and actors working in Soviet film during the period 1918-1953, she shows, first, some of the benefits of obscurity, and second, that changes in ethnic and regional integration during this period offer a different picture of centralisation than a study of political centralisation.
Soviet Union; Soviet republics; Soviet film; networks; centralisation; centrality; Digital Humanities.


The Soviet film industry, like any other institution, was made up of networks of people who worked together, or who worked with people who had worked together. In this article, I analyse some of those connections using digital social network analysis (SNA). The history of Soviet film has usually focused on the role of state censorship and ideology in shaping artistic choices, often focusing on individual directors. More recently, though, focus has shifted somewhat to include studies of the institutions and technologies of film production.1 This extensive literature generally shows that, during its first decades, the history of Soviet film production was a history of almost continual shifts in priorities, funding possibilities, political requirements, technologies, and industry leadership. As the political system centralised under Stalin, so too did the film industry, all of which had profound effects on the kinds of films that were made and how they were made. Social network analysis of connections between the people involved in film making gives us a new perspective on the impact of these shifting conditions on the ways films were made. By looking at a few specific measures of work connections among a very large number of people, we gain new insights into the social structures of Soviet film production and raise important new questions for future research.

One of the main functions of SNA is to measure what is called ‘centrality’ by mapping the connections between individuals and identifying points in a network that have high degrees of connection with others. Centrality in a social network, however, differs somewhat from centrality in a political system: while social networks have dynamics that intersect with politics, they have dynamics of their own that develop independently. My goal in this article is to show the ways political centralisation and network dynamics intersect in the early Soviet film industry.

Digital Social Network Analysis

Social network analysis generates new insights and questions by measuring connections between individuals. People (like animals, trees, and fungi), are social beings: we thrive, we fail, we communicate, and we grow through our connections with others. Those connections can take many forms – kin, friend, acquaintance, enemy, colleague, assistant, or boss, political ruler or media celebrity, pets or plants – and they can be of varying intensity and significance – strong, weak, harmonious, contested, obligatory, or voluntary. Together these crisscrossing social connections between individuals create networks, and those social networks have been the center of the study of who we are, alone and together, at least since people started writing down their thoughts.

Analysis of these networks is not new, but the rise of professional sociology in the nineteenth and early twentieth centuries saw the development of efforts to study networks empirically and mathematically. Early sociological network analysis departed from other forms of social research in focusing on the structures of connection linking individuals rather than on the behavior or beliefs of individuals or groups (Freeman 2004: 2). For example, SNA was used by sociologists to study the distribution of friendship and kin relationships by studying the number of people who knew each other or were related to each other in a specified location and the relative importance of competing trading centers in a given region by looking at the number of journeys taken from export centers to import centers (Freeman 2004). These connections, which seem quite simple, offer perspectives on such important aspects of social interaction as information flows, the relative ability to influence others, the distribution of groups into discrete communities, and the degree of integration within groups. As a method for studying human relationships, the measurement of specific connections between a discrete number of people or places made social network analysis conducive to mathematical analysis and then to computerisation (Kadushin 2012, Watts 2003). The advent of personal computers, access to large sets of data via the internet, and programs that simplified data processing make the tools of social network analysis widely available.

Today social network analysis has applications in all fields of the humanities and social sciences as well as in health care, urban planning, marketing, human resource management, and other fields. In health care, for example, contact tracing, so essential to containing the spread of infectious diseases, is based on the fundamentals of social network analysis and SNA programs are helping health professionals track and maintain records they collect about the potential spread of disease from one person to another (Chen 2011, Valente 2017). In the humanities, one of the most well-known SNA projects is “Six Degrees of Francis Bacon,” which maps approximately 88 million social connections between individuals in early modern Britain, with the statesman and philosopher, Francis Bacon, at the center. The title is a reference to the theory that all the people on earth are linked in six or fewer “friend of a friend” connections (also known as degrees of separation) to each other. This theory, originally proposed by Frigyes Karinthy in 1929, led to the popular 1990s game, “Six Degrees of Kevin Bacon,” in which people would try to establish the number of connections between any Hollywood actor and the prolific Kevin Bacon. Anyone who acted in a film with Bacon receives a score of ‘one’ (or one degree of separation); anyone who never acted with Bacon but acted in a film with someone who did act with Bacon received a score ‘two’ and so on (Watts 2003: 93, Kadushin 2012: 108-34). In scholarly film studies, such SNA measures seem to be used most often to try to predict film success by measuring the person-to-person dissemination of media attention or by studying patterns of interaction among characters, rather than to study working connections in film production history as I propose here.2

Since the beginning of social network analysis, data visualisation has been a key component. According to Linton Freeman, a sociologist and historian of SNA, it was Jacob Moreno, one of the pioneers in the field, who invented the basic visual form that all SNA studies still take, including the 88 million connections in “Six Degrees of Francis Bacon.” Moreno began by thinking about how to identify a single relationship that might connect two individuals in a way that could then be scaled up to measure large numbers of the same kind of connection. He diagrammed these connections using points to represent social actors and lines to represent their connection (Freeman 2000). Moreno viewed these graphs as more than simple representations or solely methods of presenting research. They were “first of all a method of exploration” (Freeman 2000). Moreno understood that visualisation made abstract concepts more concrete, which allowed researchers to develop an understanding of ideas or see patterns in the data that might be more difficult to extract from a large data set than a visual graph. As I will demonstrate below, I found this to be the case in my own research: patterns in the data that I was able to see first in the visualisation raised key questions that I would then test in the corresponding data and historical sources.

In general, the shift from qualitative humanities research to big data quantitative research not only involves new methods and technologies, it requires a profound shift in modes of thinking. To do SNA, you have to learn to think like a computer. Computers, of course, do not ‘think,’ but they process information in ways that are distinctly different from the ways scholars in the humanities are accustomed to thinking. Getting a PhD in history, for example, requires us to learn some new skills, but they’re mostly refinements of skills we’ve been practicing since we started going to school. Digital history, on the other hand, demands a different way of thinking. As historians (or humanists more generally), we often think about people in contexts that involve multiple, overlapping interactions of various kinds: social, political, cultural, and economic, to put it in the simplest terms. The quantifiable interactions that computer programs (and network analysts and computer scientists) usually work with must be discrete units, broken down to small, clearly definable attributes. As one network specialist, Scott Weingart, puts it: “Humanistic data are almost by definition uncertain, open to interpretation, flexible and not easily definable. Node types (nodes are fundamental units of networks) are concrete; your object either is or is not a book. Every book-type thing shares certain unchanging characteristics” (Weingardt 2011). For dealing with large numbers of films and large numbers of people, turning qualitative texts into quantitative data – like who worked with whom or how often individual words are spoken – is worthwhile because the large scale of those phenomena combined with computational speed with which computers can process them create new kinds of knowledge and raise new questions. These data points may look simple (or simplistic) to humanists accustomed to thinking about art works or production processes in all their complexity, but it’s not simple to think effectively in these terms. To translate familiar kinds of nuanced, overlapping contexts and interactions into meaningful digital structures that yield new insights requires new modes of thinking. On the other hand, Miriam Posner has argued that while humanists often need help with data modeling in order to organise large bodies of information in ways that make a computer produce something useful, it is equally true that computational digital scholars need help from humanists to make a computer produce something meaningful (Posner 2015).

First Steps

The present study builds on a preliminary social network analysis done in 2013 by Seth Bernstein, who examined social networks in the Soviet film industry by making a database of the artists who worked on films between 1918–1991 (Bernstein 2013). This database was derived from the lists on the Russian website kino-teatr.ru, which include everyone listed as cast and crew on individual films: directors, actors, producers, cinematographers, and others. Bernstein was also interested in measuring the connectedness of people who worked in Soviet film: who worked with whom? He showed several interesting things about the connectedness of these film workers. He showed, first, that the Soviet film industry was densely connected. Almost everyone in his data set was connected in six steps or less, but the average degree of separation between people is actually much lower than six2.76and the majority of Bernstein’s film people are connected by 3 or fewer degrees (meaning that the majority of people who worked on films between 1918 and 1991 were connected to each other through only two intermediaries). Bernstein also employed what are called “weighted” measurements that take into account not only if people worked together, but how often people worked together; emphasis on those repetitive ties indicates an even more densely connected film industry overall. Bernstein showed that the most highly connected people who also had higher rates of repeated ties were neither acclaimed directors nor beloved actors, but the journeymen (men, specifically: women rarely rose to high connectivity). The most highly connected people were those who worked a lot over a long period of time and in a variety of roles (director, actor, writer, for example), with people who themselves were highly connected. As Bernstein put it: “What this calculation of network centrality measures is not necessarily fame but rather position…   a different kind of influence (maybe banal) lost with qualitative sources. Those sources tend to focus on the brightest figures but don't register those ubiquitous people who stood out less” (Bernstein 2013).  This conclusion indicates that centrality in a social network is different from centrality in a political system or popularity in a social group.

In his study of social connectedness, Duncan J. Watts writes that the study of networks shows us that fame and high status, traditional markers of centrality, do not always drive events or control institutions. “In a multitude of systems from economics to biology, events are driven not by any preexisting center but by the interaction of equals” (Watts 2003: 53). We might consider the range between economics and biology to be rather narrow, but this insight drives many productive questions about networks, including my own. What kinds of networks do we find if we think about centrality, or the condition of being highly connected mathematically as equally important as fame? How important is it to be highly connected, that is, to work with the greatest number of people, and what can that tell us about social structures?

My project looks into the connections Bernstein studied in more detail in order to break into the density that he found and examine the significance of some of those well-positioned people, the specific people they were connected to, and the “interactions of equals” that took place outside the centers of fame and authority. I also studied a shorter chronological period than Bernstein, 1918–1953, in order to measure changes over time. I limited my study to connections between directors and actors, because these are the only categories that kino-teatr.ru lists systematically for every film made during this period. Directors and actors hardly constitute the total group of people involved in making a single film, but they are the public face of filmmaking, and were often in high demand by studios, which makes their connections significant in their own right. Although limiting myself to directors and actors means that I will not be examining Soviet film production in its entirety, this introductory study of a limited number of connections brings out unexpected and revealing connections between people.

Directors and Actors in Aggregate, 1918–1953

My first database includes film directors and actors who worked together on a film at some point during the whole period of the study, 1918–1953.3 We then broke that database down to make individual databases for the periods that roughly correspond to political and film industry shifts: 1918–1923, 1924–1929, 1930–1935, 1936–1940, 1941–1946, and 1947–1953 (the year Stalin died).4 The first question is: How many people are we studying? People in social networks are referred to as “nodes” and the connections between them are referred to as “edges.” I used the SNA program Gephi (gephi.org) to analyse and visualise the data. Gephi showed that there were a total of 4810 people or nodes with 78,056 connections or edges among them.

Periods, nodes, edges.

Gephi analyses those edges (or connections or degrees of separation) in a number of useful ways. At the simplest level, it calculates the number of connections between any two nodes or people and it weights those degrees for repeated connections (as Bernstein did). As the reader can see in figure 2, the average number of steps between nodes was just a bit higher, meaning just a bit longer, for the 1918–53 period than Bernstein found for the whole Soviet period 1918–91: just over 3. That means that actors and directors who worked on films in the period 1918–1953 were connected by more than two intermediaries for a rating of more than three rather than a rating of 2.76 in the period 1918–1991. I’ll come back to the reasons for this disparity below.

1918–53: periods and paths.

Gephi can also show specific kinds of weighted connections, that is connections between individuals that show rates of connectivity. The networks produced by these weighted connections can show individuals who are especially highly connected in various ways or what SNA scholars call their “centrality.” The type of connection I’m most interested here in Eigenvector centrality, which measures the number of connections between a node or person with people who are themselves highly connected. In addition to the number of connections any one person has or the length of the path between individuals – the number of steps it takes to link people – Gephi can tell us not just who has the most connections, but who has the most connections that matter. This kind of centrality does not necessarily point to the most famous or powerful actors and directors. By so doing it indicates a different definition of “centrality” (Kadushin 2012, Freeman 1979).

Second, Gephi does a very useful division of nodes into what it calls “communities.” It can count up all the connections between individuals and indicate relative ratios of connectivity, which results in distinguishing groups of nodes with closer connections to each other than they are to nodes in the network as a whole. Some of these groups or communities are even more closely, densely connected than other communities and Gephi analyses these communities for degrees of that density, which it calls “modularity.” Modularity compares degrees of community connectivity (Meeks 2011). Gephi also turns these measures into more intuitive data visualisations that make these abstract calculations more concrete and comprehensible.

Figure 3 is a Gephi-generated data visualisation that graphs the connections between directors and actors for the whole period, 1918–1953, colourised to indicate communities (that is groups of individuals more closely connected to each other than to others outside their community) and with node sizes based on their Eigenvector score (that is people who know people who know a lot of well-connected people are represented as larger circles). Gephi uses gravitational algorithms to make these graphs: nodes are driven out from a common starting point based on their size (or gravitational weight) but at the same time they are driven towards nodes they are strongly connected to.

1918–1953: Eigenvector centrality and communities.

The first thing to notice is that even though this network is clearly very interconnected (the solid coloured areas are made up of individual node dots), it is just as clearly divided into communities. Not surprisingly these communities partially break down in rough geographic terms. My analysis of the content of communities is based entirely on surnames associated with different regions and ethnicities. The orange and pink nodes at the bottom are predominantly Georgian and Armenian names.

Eigenvector centrality and communities, 1918–1953, detail.

The blue, green, and pink are overwhelmingly filled with nodes with Russian names. Names in blue are mostly people associated with the Leningrad film studio, Lenfilm; and the pink and green, with Mosfilm, the Moscow film studio (I haven’t discovered the difference between pink and green communities); the black in figure 3 is largely Ukrainian.

Eigenvector centrality and communities, 1918–1953, detail.

Each community is both clearly distinct but very much tied to the whole. Even after Mikhail Gelovani, who became well-known for playing Stalin in a number of films, and the prominent Georgian director, Mikhail Chiaureli, are integrated into the Soviet film industry, their connections measured in the aggregate for this period identify them as more connected to their geographical-ethnic community. The strong connectivity in geographic terms, I suspect is the answer to the question I raised earlier. This is why individuals are less closely connected as a whole in the 1918–1953 period than they are in the 1918–1991 period as Bernstein discovered. And this tells us something interesting about this period: in this earlier period the regional film industries were less integrated into the whole and individuals less connected with filmmakers in other regions. One of the things I discovered is how that changes. But first, what about those nodes/people?

Who rises to the top with high scores for Eigenvector centrality, that is they are connected to people who are themselves highly connected (see Figs. 5 and 6)? Vladimir Gardin was a prominent prerevolutionary actor, director, and, later, screenwriter; he was also one of the founders of the first Soviet film school, participating altogether in over 100 films. A notable number of the figures with the next size dots are associated with one of the most famous Soviet directors, Sergei Eisenstein: Nikolai Cherkasov, Oleg Zhakov, Aleksei Abrikosov, and Mikhail Zharov; Amvrosii Buchma in Ukraine, all had independent careers, but they also all played roles in Eisenstein’s Ivan Groznyi / Ivan the Terrible (1944–46/58, USSR). Maksim Shtraukh was a prominent stage and screen actor and Eisenstein’s childhood friend. Eisenstein, himself, though, has a low Eigenvector despite being arguably the most important individual in the early film industry, and the Artistic Director of Mosfilm after that position was introduced in 1941. Many people, therefore, with close connections to Eisenstein have high Eigenvector scores. The person with the highest Eigenvector score for 1918–53, the Kevin Bacon of the Soviet screen (or, in fact, the Christopher Lee, who was the most highly connected actor in Hollywood of the 562,600 actors listed in Wikipedia at the time of calculation) is Vladimir Ural’skii. One has to have lived in the Soviet Union during this period and known a lot about Soviet actors to know the name Vladimir Ural’skii, but he had a prolific career.

Born Vladimir Popov in 1887 in Orenburg, he went to work at age 8 in a bakery but, as his official biography put it, he did not find his passion there. Аt 22 in 1909 he joined a theater where he achieved some acclaim. Just before the First World War, he was accepted to study at MKhAT, the acclaimed Moscow Art Theater, but in 1913 was exiled to Helsinki for political untrustworthiness. After the First World War, the Russian Revolution, and Civil War, he returned to Moscow in 1923 and got his first acting role in what would be one of the most important films of the silent period, Eisenstein’s first feature film, Strike, where he played a worker (kino-teatr.ru). He worked steadily during this entire period, acting in 136 films starting with Aelita (Yakov Protazanov, 1924, USSR) and ending with Zelenyе ogni / Green Lights (Sploshnov and Shul’man, 1955, USSR).

Ural’skii’s career.

He worked on prestige films with great and famous directors, including many of the classics: he was in a worker in Eisenstein’s Stachka / Strike, (1924, USSR), a sailor in Bronenosets Potemkin / Battleship Potemkin (1925, USSR) and a priest in Ivan the Terrible (1944–46/58). In addition to these, he was in films directed by Yakov Protozanov, Vsevolvod Pudovkin, Oleksandr Dovzhenko, Grigorii Kozintsev and Leonid Trauberg, Boris Barnet, Ivan Pyr’iev, Lev Kuleshov, Mikhail Romm, Aleksandr Zarkhi and Iosif Kheifits, Fridrikh Ermler, and Olga Preobrazhenskaia. He was in Mikhail Zharov’s first film as a director, and he was in Leonid Lukov’s Bol’shaia zhizn’ / Great Life (1946, USSR), which Stalin singled out for criticism in 1946 along with Ivan Groznyi Part II and Pudovkin’s Admiral Nakhimov (1947, USSR). After the war, he was in films directed by Iulii Raizman and Sergei Gerasimov; he was in Pyr’ev’s Kubanskie kazaki / Cossacks of Kuban, and he was in Stalingradskaia bitva / The Battle of Stalingrad (Vladimir Petrov, 1948–49, USSR) and Revizor / The Inspector General (Vladimir Petrov, 1952, USSR). But of the 130 films he appears in, he played named roles in only 36 of them (28%).