The Karasuk Culture: Potentially the Ancestors of Iranian and later Scytho-Sarmatian nomads

With the advances of aDNA, we have now begun to tackle  questions, such as the origin of the “Scythian peoples”.  This was first seen with Unterlander et al (2017), and more were included into  Damgaard et al (2018). With the help of Allentoft et al (2015), Mathieson et al. (2018), Narasimhan et al. (2018), along the two previously mentioned papers, I will check the question of origin for the early Iranian nomads.

Bagley (n.d.), attempted to summarize the work on the early Zhou period and their interaction with Siberian Bronze Age center. This was based on work by  Loeuwe & Shaugnessy (1999). This highlights interesting aspects of the trade between these two groups, with artifacts related to the Karasuk culture spreading to not only China, but also towards Europe (Bagley, n.d.). While their early dating of a movement (Chernyk, 2008), does not really match the genetic view to this point, there are later samples which hint in this direction.


chernyk.png Since the time of Herodotus, many have had their own ideas on the origins of the Scythians. Mallory (1989) noted that some thought that the origin lie in the west, in the region north of the Black Sea. Others, saw the Scythians, and Iranians in general, as originating in Central Asia, and even Siberia. Some have even thought that a multi-regional origin was more likely, with changes being cultural, rather than demographic.

Davis-Kimball (2005), was one that saw the Scythians as a group that was multi-ethnic, rather than group with a single origin, or denoting a single group of people. Sometimes, anything west of Inner Mongolia and China was referred to as Scythian, but Scythian would also sometimes be restricted to those in the Western and Central Steppes (Di Cosimo, 1999).

steppe culturesThe first way to go at this, I feel, is to look at Karasuk. A culture that Mallory (1997), described as very mobile, compared to Andronovo, that is known more by their kurgan burials than their settlements. Karasuk is also seen as being highly influential and starting the animal art so common among the “Scythian” people (Keyser et al, 2009). Mallory (1997) even mentions the potential of the Karasuk to have a specific “proto-Iranian” identity. The influence of the Yenesei, and Slab Grave people cannot be underplayed (Mallory, 1997). Okunevo is thought to be a mix of Afanasievo and local Yeneseian groups (Great Soviet Encyclopedia, 1979), in an area later within the Andronovo sphere, and this mixing may likely be the formation of the Karasuk culture within the Minusinsk Basin. Okunevo is thought to be the group that introduced realistic animal art to these later steppe pastoralists as well.

First of all, I wanted to take a look at the Karasuk cluster that is closer to the Andronovo samples in PCA. To understand the make-up of Karasuk, I first used qpAdm to find a valid model of their origin. With qpAdm, the set of right populations, or outgroups chosen included Mbuti_DG, Ust_Ishim, Kostenki14, EHG, Villabruna, Ganj_Dareh_N, Anatolia_N, Steppe_EMBA, Karitiana, and the Ami.

The most successful model of the Karasuk culture needed excess Han-related ancestry, in addition to the ENA found in the Okunevo samples. Best exemplified with the Shamanka_BA run.


Chi-square Tail-prob Andronovo Okunevo Han
17.866 0.0222543 0.721 0.279 NA
std error 0.02 0.02 NA
Chi-square Tail-prob Andronovo Okunevo Han
9.613 0.211584 0.766 0.178 0.056
std error 0.026 0.04 0.019
Chi-square Tail-prob Andronovo Shamanka_BA
5.1 0.746845 0.814 0.186
std error 0.016 0.016

Looking at the Deeper Ancestry of the Karasuk Culture, I tried to make them a mix of Sintashta, Afanasievo, and an ENA group from the Baikal area, Shamanka_EN. This made sense as to making a mixture of a Siberian hunter, Bronze Age steppe pastoralists, and also Middle to Late Bronze Age groups in Central Asia. While the standard errors are a little high, it is clear that the dominant ancestry in Karasuk is Sintashta-related.

Chi-square Tail-prob Sintashta Shamanka_EN Afanasievo
6.196 0.625314 0.686 0.189 0.125
std error 0.069 0.014 0.07

After adding Steppe_MLBA, Germany_MN, and West_Siberia_N to the pright outgroups:

Chi-square Tail-prob Sintashta Shamanka_BA Afanasievo
7.951 0.633621 0.541 0.178 0.281
std error 0.081 0.017 0.081

Interstingly, the Karasuk is also seen to have expanded, if not influenced all the way towards the Black Sea, and at least the Aral Sea (((((((((Trying to relocate citation!!!!!!))))))))

Other samples, dating to about the same time, North of the Aral sea are seen in Mezhovskaya. Even more interesting, is that samples are near genetic dittos to the Karasuk samples. Could Mezhovskaya be part of the western Karasuk group that creates the great cultural uniformity among earlier Iranian nomads through the Scythian period? Potentially, yes.


Chi-square Tail-prob Andronovo Okunevo Han
12.248 0.140492 0.741 0.259 NA
std error 0.028 0.028 NA
Chi-square Tail-prob Andronovo Okunevo Han
6.036 0.535555 0.784 0.151 0.064
std error 0.032 0.051 0.025
Chi-square Tail-prob Andronovo Shamanka_BA
5.318 0.723087 0.846 0.154
std error ..022 0.022

With Chechushkov et al (2018), we see that horse-riding in battle may have begun in Central Asia between 1500-1200 BCE. Which is, of course, during the highly mobile Karasuk period and within the range of these groups.

Mezhovskaya can essentially be modeled as 100% Karasuk with qpAdm, as any additional ancestry is within the standard error of that component.

The next question then is, is Karasuk, and possibly by extension Mezhovskaya, the homeland and ancestors of the Scythians? Are they also ancestral to the western Scythians, as far as Hungary?


art by Johnny Shumate

The first Scythian group I looked at was the Tagar Culture, which followed the Karasuk in the Minusinsk Basin. The Karasuk is indeed very important here for the Tagar. Even the Karasuk+Karasuk outlier combo works here. What’s even more interesting about the Tagar culture, is the great similarity between their art and that of the European Scythians (Keyser et al, 2009; Encyclopaedia Britannica, n.d.).


Chi-square Tail-prob Karsuk Okunevo
5.291 0.726104 0.933 0.067
std error 0.037 0.037
Chi-square Tail-prob Karasuk Shamanka_BA
7.201 0.515133 0.967 0.033
std error 0.021 0.021



The Pazyryk Culture is another well-known group of Scythians, that include the famous tattooed mummy. Their culture is seen as having been very warlike (Citation)))))))))))))

They also require a lot of Karasuk ancestry and also groups that are from nearby, or closely related groups to these samples.

Chi-square Tail-prob Karsuk Okunevo Han
16.037 0.0247822 0.313 0.34 0.347
std error 0.036 0.05 0.023
Chi-square Tail-prob Karasuk ShamankaBA Han
1.761 0.971876 0.43 0.43 0.14
std error 0.028 0.08 0.061



Chi-square Tail-prob Karasuk Okunevo
116.899 1.45E-21 0.41 0.59
std error 0.089 0.089
Chi-square Tail-prob Karasuk BMAC Han
6.075 0.531047 0.568 0.099 0.333
std error 0.045 0.042 0.019


Tian-Shan Saka

Chi-square Tail-prob Karasuk Okunevo BMAC Han
10.628 0.10059 0.574 0.134 0.21 0.082
std error 0.06 0.047 0.023 0.018
Chi-square Tail-prob Karasuk ShamankaBA BMAC
10.703 0.152108 0.618 0.173 0.209
std error 0.04 0.017 0.033

The Tian-Shan Saka graph here did get a little over-complicated for my taste, but with such a complex mixture it might be bound to happen.



Chi-square Tail-prob Karasuk Okunevo Han BMAC
6.786 .341095 0.429 0.284 0.180 0.107
std error 0.06 0.051 0.02 0.034
Chi-square Tail-prob Karasuk Shamanka_BA BMAC
2.488 .927977 0.526 0.372 0.102
std error 0.044 0.019 0.036


Scythian_Samara (Steppe_IA)

Chi-square Tail-prob Karasuk Armenia_EBA
20.194 0.00962638 0.923 0.077
std error 0.048 0.048
Chi-square Tail-prob Karasuk BMAC
15.104 0.0571488 0.863 0.137
std error 0.042 0.042
Chi-square Tail-prob Karasuk BMAC West_Siberia
9.936 0.192223 0.769 0.166 0.065
std error 0.067 0.045 0.037
Chi-square Tail-prob Karasuk BMAC Botai
8.261 0.310125 0.674 0.236 0.089
std error 0.108 0.068 0.056
Chi-square Tail-prob Mezhovskaya BMAC
12.765 0.120182 0.913 0.087
std error 0.05 0.05
Chi-square Tail-prob Tagar BMAC
17.493 0.0253636 0.841 0.159
std error 0.042 0.042

Hungarian Scythian

Chi-square Tail-prob Karasuk Hungary_BA
13.624 0.0921081 0.355 0.645
std error 0.035 0.035
Chi-square Tail-prob Karasuk Balkan_BA
13.99 0.0820368 0.247 0.753
std error 0.037 0.037
Chi-square Tail-prob Scythian_Samara Hungary_BA
18.514 0.0176836 0.314 0.686
std error 0.029 0.029
Chi-square Tail-prob Mezhovskaya Hungary_BA
16.258 0.0388319 0.339 0.661
std error 0.043 0.043



Allentoft et al., Population genomics of Bronze Age Eurasia, Nature 522, 167–172 (11 June 2015) doi:10.1038/nature14507

Bagley, R. Shang Archaeology; The Northern Zone. (1999)

“Central Asian arts: Neolithic and Metal Age cultures”. Encyclopædia Britannica Online. Encyclopædia Britannica

Chechushkov et al., Early horse bridle with cheekpieces as a marker of social change: An experimental and statistical study, Journal of Archaeological Science, Volume 97, September 2018, Pages 125-136,

Chernykh, The Formation of the Eurasian “Steppe Belt” of Stockbreeding Cultures.

Di Cosimo, Nicola, “The Northern Frontier in Pre-Imperial China (1,500 – 221 BC)”, in: M. Loeuwe, E.L. Shaughnessy, eds, The Cambridge History of Ancient China: From the Origins of Civilization to 221BC, 1999, Cambridge University Press 1999, ISBN 9780521470308

Keyser, Christine; Bouakaze, Caroline; Crubézy, Eric; Nikolaev, Valery G.; Montagnon, Daniel; Reis, Tatiana; Ludes, Bertrand (May 16, 2009). “Ancient DNA provides new insights into the history of south Siberian Kurgan people”. Human Genetics. Springer-Verlag.

Mallory, J. P. (1997). Encyclopedia of Indo-European Culture. Taylor & Francis. ISBN 1884964982.

Mathieson et al., (2018) The genomic history of southeastern Europe. Nature 555, 197-203. (Paper / doi:10.1038/nature25778)

Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi:

“Okunev Culture”. The Great Soviet Encyclopedia. 1979

Unterländer et al., Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe, Nature Communications 8, Article number: 14615 (2017), doi:10.1038/ncomms14615

Another look at South Asian aDNA

With Narasimhan et al (2018), we got our first look at Central, South Central, and South Asian aDNA. Not only did we get to see new steppe samples throughout the Bronze Age, but even from the Chalcolithic, through the Bronze Age in the Turan region, including BMAC. While there certainly looks to be steppe ancestry in South Asia, it has likely been highly inflated with previously available aDNA, and those that did not account for ANE that was already present in the region. The anticipation of the soon to be released Harappan sample(s), the models will only improve further.

This post will be constantly evolving as I add new outputs from qpAdm and qpGraph, so keep checking back in.

What I have noticed using qpAdm is that South Asian Dravidians do wonders as stand-ins for Harappan ancestry. So, we may see that some group greatly resembles them. I have seen that using the Palliyar and Paniya does work well, but the Irula does seem to work best. I don’t know whether that really means anything or the fact that they have more coverage.

The first thing I did was to look for populations to occupy the right pops, or populations which create the most significant D-stats between my left populations, or those set as the populations used in the mixture. Aside from using an African, Mbuti_DG, I found that using Ust-Ishim, Onge, Ami, EHG, Iron_Gates, Anatolia_N, Ganj_Dareh_N, and Karitiana. Kostenki14 is a hit and miss, as it doesn’t always have significant stats involved comparing two populations. This could be due to the age of the sample and not really developing any significant drift that can help differentiate populations in the test. This can lead higher chi-squares and lower tail-probabilities.

For the following, Brahmin_SGDP and Brahmin_Tiwari did have good marker counts, ranging from 170-200K, but the Brahmin_TN and Brahmin_UP sit around 50K, so they should be taken with a grain of salt.

SIS1= Shahr_I_Sokhta_BA1

Arm_EBA= Armenia_EBA


SGDP chisq tail prob SIS1 Irula Sintashta Dali_EBA Arm_EBA
w Kostenki 2.872 0.82475 0.208 0.623 0.1 0.069 NA
std error 0.038 0.035 0.036 0.035 NA
w/o Kostenki 2.031 0.844866 0.212 0.62 0.103 0.065 NA
std error 0.037 0.034 0.036 0.035 NA
w Kostenki 1.868 0.760109 0.167 0.609 0.065 0.097 0.063
std error 0.071 0.034 0.063 0.051 0.08
w/o Kostenki 2.599 0.761475 0.178 0.61 0.057 0.097 0.058
std error 0.063 0.035 0.065 0.046 0.081
Tiwari chisq tail prob SIS1 Irula Sintashta Dali_EBA Arm_EBA
w Kostenki 8.292 0.217452 0.138 0.583 0.208 0.071 NA
std error 0.025 0.021 0.023 0.02 NA
w/o Kostenki 7.332 0.19711 0.139 0.577 0.211 0.073 NA
std error 0.025 0.02 0.023 0.02 NA
w Kostenki 5.855 0.320606 0.089 0.579 0.154 0.099 0.08
std error 0.04 0.021 0.039 0.026 0.049
w/o Kostenki 5.351 0.253112 0.096 0.574 0.162 0.097 0.071
std error 0.039 0.02 0.039 0.026 0.049
TN chisq tail prob SIS1 Irula Sintashta Dali_EBA Arm_EBA
w Kostenki 0.925 0.988309 0.156 0.656 0.113 0.074 NA
std error 0.043 0.039 0.04 0.04 NA
w/o Kostenki 0.557 0.989882 0.168 0.643 0.117 0.072 NA
std error 0.042 0.037 0.038 0.039 NA
w Kostenki 0.861 0.973004 0.145 0.653 0.097 0.086 0.019
std error 0.079 0.039 0.073 0.049 0.095
w/o Kostenki 0.0624 0.960366 0.191 0.638 0.13 0.068 -0.027
std error 0.079 0.038 0.072 0.049 0.093
UP chisq tail prob SIS1 Irula Sintashta Dali_EBA Arm_EBA
w Kostenki 7.561 0.272087 0.147 0.598 0.181 0.075 NA
std error 0.032 0.028 0.031 0.031 NA
w/o Kostenki 5.636 0.343262 0.151 0.59 0.188 0.071 NA
std error 0.031 0.027 0.029 0.029 NA
w Kostenki 7.338 0.196693 0.11 0.599 0.144 0.094 0.054
std error 0.066 0.028 0.059 0.039 0.081
w/o Kostenki 5.866 0.209375 0.136 0.59 0.171 0.08 0.022
std error 0.061 0.027 0.054 0.037 0.073


Dzh1 = Dzharkutan1_BA, Late BMAC

Steppe_E = Steppe_MLBA_East

SGDP chisq tail prob Irula Dzh1 Sintashta Steppe_E Dali_EBA
w Kostenki 6.954 0.433685 0.681 0.203 0.116 NA NA
std error 0.023 0.034 0.028 NA NA
6.294 0.505837 0.678 0.198 NA 0.124 NA
std error 0.023 0.034 NA 0.028 NA
3.787 0.705485 0.629 0.225 NA 0.075 0.07
std error 0.029 0.036 NA 0.037 0.033

The above is interesting in that there are whole graves spread around from India to West Asia that are completely late BMAC in character. There seems no possible way for there to not be detectable BMAC ancestry in South Asia, considering the amount of cemeteries and remains. I think the Harappan sample(s) will show that BMAC ancestry is indeed important in South Asia.

Looking at the Swat Valley samples, it gets even more interesting…

Aligrama chisq tail prob Irula Dzh1 Steppe_E Dali_EBA
12.238 0.0568697 0.49 0.355 0.091 0.063
std error 0.03 0.036 0.034 0.03
Butkara_IA chisq tail prob Irula Dzh1 Steppe_E Dali_EBA
5.148 0.524941 0.404 0.489 0.03 0.077
std error 0.029 0.034 0.034 0.031
Pak_IA_Ali chisq tail prob Irula Dzh1 Steppe_E Dali_EBA
8.014 0.237064 0.431 0.419 0.087 0.063
std error 0.042 0.053 0.05 0.043
S_Sharif_IA chisq tail prob Irula Dzh1 Steppe_E Dali_EBA
8.433 0.208056 0.437 0.364 0.141 0.059
std error 0.018 0.023 0.022 0.018
SPGT chisq tail prob Irula Dzh1 Steppe_E Dali_EBA
11.378 0.0773638 0.316 0.503 0.113 0.069
std error 0.014 0.018 0.018 0.015

Interestingly, there seems to be no need for Andronovo admixture in Butkara, Pakistan_IA_Aligrama, and also the first Aligrama can do okay with just Dali, plus late BMAC. Of course, this all depends on the underlying population being similar to the Irula. Either way though, the Steppe ancestry should really not move. Next, I’ll see how including all BMAC samples affects the output.

Aligrama chisq tail prob Irula BMAC Steppe_East WSiberia_N
15.054 0.0198432 0.5 0.359 0.083 0.058
std error 0.027 0.036 0.034 0.022
Butkara_IA chisq tail prob Irula BMAC Steppe_East WSiberia_N
8.929 0.177591 0.411 0.477 0.052 0.06
std error 0.025 0.033 0.034 0.022
Pak_IA_Ali chisq tail prob Irula BMAC Steppe_East WSiberia_N
6.755 0.344126 0.438 0.402 0.104 0.055
std error 0.037 0.053 0.05 0.032
S_Sharif_IA chisq tail prob Irula BMAC Steppe_East WSiberia_N
10.458 0.106644 0.442 0.368 0.144 0.046
std error 0.015 0.022 0.022 0.013
SPGT chisq tail prob Irula BMAC Steppe_East WSiberia_N
8.626 0.195754 0.313 0.509 0.102 0.076
std error 0.011 0.016 0.015 0.009


Update 8-22-18– Looking at Shahr_I_Sokhta 1,2, and 3.

SIS1 chisquare Tail-prob Ganj_Dareh W_Siberia
46.394 7.33E-08 0.953 0.047
std error 0.02 0.02
SIS1 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia
2.715 0.84368 0.775 0.148 0.076
std error 0.03 0.021 0.019
SIS1 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia Onge
5.606 0.346473 0.716 0.172 0.095 0.017
std error 0.051 0.027 0.022 0.031
SIS1 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia Irula
  2.275 0.80994 0.738 0.155 0.073 0.034
std error 0.051 0.022 0.02 0.041
SIS2 chisquare Tail-prob Ganj_Dareh W_Siberia
31.452 5.13E-05 0.837 0.163
std error 0.021 0.021
SIS2 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia
31.336 2.19E-05 0.842 -0.005 0.163
    std error 0.036 0.023 0.022
SIS2 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia Onge
7.09 0.214035 0.663 0.042 0.15 0.145
std error 0.057 0.031 0.024 0.034
SIS2 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia Irula
9.597 0.0874816 0.61 0.031 0.128 0.231
std error 0.058 0.023 0.021 0.049
SIS2 chisquare Tail-prob Ganj_Dareh W_Siberia Irula
11.044 0.0870409 0.662 0.124 0.214
std error 0.043 0.022 0.047
SIS2 chisquare Tail-prob SIS1 Irula
20.805 0.00407046 0.661 0.339
std error 0.046 0.046
SIS2 chisquare Tail-prob SIS1 W_Siberia Irula
12.337 0.0548625 0.636 0.074 0.29
    std error 0.045 0.025 0.049
SIS2 chisquare Tail-prob Sarazm_EN Irula
7.525 0.376375 0.707 0.293
std error 0.043 0.043
SIS3 chisquare Tail-prob Ganj_Dareh W_Siberia
270.628 0 0.835 0.165
    std error 0.023 0.023
SIS3 chisquare Tail-prob Ganj_Dareh W_Siberia Onge
  14.444 0.0250507 0.494 0.089 0.417
std error 0.031 0.024 0.03
SIS3 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia Onge
10.019 0.074706 0.401 0.063 0.097 0.439
std error 0.052 0.028 0.023 0.032
SIS3 chisquare Tail-prob Ganj_Dareh Anatolia W_Siberia Irula
6.653 0.247741 0.229 0.004 0.039 0.727
std error 0.058 0.024 0.02 0.048
SIS3 chisquare Tail-prob Ganj_Dareh W_Siberia Irula
6.47 0.372669 0.231 0.035 0.734
std error 0.042 0.02 0.047
SIS3 chisquare Tail-prob SIS1 Irula
5.423 0.608532 0.23 0.77
std error 0.041 0.041
SIS3 chisquare Tail-prob Geoksiur Irula
5.274 0.626561 0.213 0.787
std error 0.038 0.038



Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi:





European Farmers Part I; Mediterranean vs Danubian

The farmers of Europe appear to be a very closely related group, that derives from a potentially singular source. We’ve seen several papers over the last couple years devoted to farmers. Last year brought us Lipson et al. (2017), and Mathieson et al. (2017). These papers brought us many new samples from the Mediterranean, Central Europe, and the Balkans. The datasets from these two papers will be the source that I am working with here.

Firstly, I wanted to look at a simple tree to find a decent fit. That led to the following:

Farmer simple

This one was not a bad fit. It just had one zero drift edge towards Iron Gates, which would probably be taken care of if more hunters were included that lacked as much ANE as Iron Gates. While the Peloponnese samples are an outgroup to the other farmers, the Koros samples, from the First Temperate Neolithic, appear to be very close to the ancestral population for both the Balkan and Mediterranean groups. For the purpose of starting here, it seems fine. Next, I wanted to add Iberia EN as an offshoot of the Cardial EN samples from Croatia, just to see if the two of Mediterranean origin really are closely related.

simple Farmer2

This graph left what looks to be a needed admixture event from a hunter branch related to Iron Gates, to Iberia EN.

simple Farmer3

This graph actually turned out very nice. Iberia EN was able to branch from the same population as the Croatian Cardial and only needed a little extra HG ancestry. This also removed the zero edge from Iron Gates. For the next run, I am going to place LBK Austria coming off the branch to Starcevo.


The first thing I will try after seeing this worst Z-score is to try an admixture edge from the branch related to Iron_Gates into LBK Austria.

simple Farmer5

Surprisingly, the admixture from a European hunter did not take care of that worst Z-score. So, I scrapped that and decided to go with the admixture from Croatian Cardial into LBK Austria.

simple Farmer6

This graph resulted in LBK being a mix of 59% Starcevo and 41% Cardial. Still, we have a worst Z that wants Starcevo to also be closer to Cardial. In this case, I will first try a shared branch opposite of Koros, and if needed, after the HG-related admixture at B3.

simple Farmer7

While this is not a bad result, we do have a couple zero edges here that I would like to resolve. The admixture from the Cardial branch to LBK has also reduced to 5% in this graph. The worst Z involves the Peloponnese Neolithic and Starcevo, and also Iron Gates and LBK. I first want to try an edge from around Iron Gates to LBK to see how that does.


For this last graph, the A2 node for HG was eliminated since there was a 0 drift edge. All HG admixture now comes off of A1. The extra HG into LBK Austria has now put the worst Z-score around 3, which isn’t too bad. The edges all look good. The surprising part is that LBK comes out nearly 50% Cardial-related. This is interesting because, LBK was seen as just a subset of late Starcevo and potentially some Vinca influence.  Since this is unexpected, I am going to see if there is more shared drift between the two before splitting, after the extra HG admixture coming after splitting with a group related to Koros EN.


Still, we have LBK Austria coming out as nearly half Cardial-related. While these results are interesting, they are not matching with D-stats, f3-ratio, or qpAdm results. There may be something else here that will take more complex graphs to figure out.

Here is another way of looking at it.


This graph makes a little more sense, with the separation of Mediterranean and Danubian groups a little more. The next step will be to separate the two before Koros EN. I will continue working from here for the rest of the post. If you have any more ideas, let me know. I will post updates as I have more.

Here are stats that have me thinking there is nothing here as far as admixture from Croatian Cardial.


Out Test Pop1 Pop2 D-stat Z-score SNPs
Mbuti_DG Cardial_EN LBK_Austria Starcevo -0.000114 -0.537 899799
Mbuti_DG Cardial_EN LBK_EN Starcevo 0.000076 0.398 903127
Mbuti_DG Cardial_EN LBK_Austria Koros_EN -0.000287 -0.959 892981
Mbuti_DG Cardial_EN LBK_EN Koros_EN -0.000035 -0.119 895930


Source Source Target f_3 std. Err Z SNPs
Koros_EN LaBrana Iberia_EN -0.002397 0.001668 -1.437 403414
Koros_EN Iron_Gates Iberia_EN -0.000355 0.001208 -0.294 598713
Koros_EN French_HG Iberia_EN -0.001492 0.001914 -0.779 127825
Cardial_EN LaBrana Iberia_EN -0.001178 0.001525 -0.773 410098
Cardial_EN Iron_Gates Iberia_EN 0.001799 0.001092 1.648 585125
Cardial_EN French_HG Iberia_EN -0.000595 0.001713 -0.348 131631
Koros_EN LaBrana LBK_EN -0.005487 0.00118 -4.65 580503
Koros_EN Iron_Gates LBK_EN -0.005153 0.000779 -6.615 755300
Koros_EN French_HG LBK_EN -0.003943 0.001382 -2.854 175316
Cardial_EN LaBrana LBK_EN -0.002165 0.001022 -2.118 555955
Cardial_EN Iron_Gates LBK_EN -0.001198 0.000649 -1.847 711734
Cardial_EN French_HG LBK_EN -0.001455 0.001165 -1.249 171888
Starcevo LaBrana LBK_EN -0.00003 0.000841 -0.035 600458
Starcevo Iron_Gates LBK_EN 0.000585 0.000568 1.031 770549
Starcevo French_HG LBK_EN -0.000667 0.001019 -0.654 180239
Koros_EN Iron_Gates LBK_Austria -0.005504 0.000912 -6.033 715204
Koros_EN Iron_Gates Cardial_EN 0.002384 0.001571 1.517 553269
Koros_EN Iron_Gates Iberia_EN -0.000355 0.001208 -0.294 598713
Starcevo Iberia_EN LBK_Austria 0.000357 0.000672 0.53 615071
Starcevo Iberia_EN LBK_EN -0.000208 0.000516 -0.403 665936
Starcevo Cardial_EN LBK_Austria 0.00095 0.00067 1.419 581722
Starcevo Cardial_EN LBK_EN 0.001295 0.000578 2.24 626360
Starcevo Iron_Gates LBK_Austria 0.000502 0.000715 0.702 729333
Starcevo Iron_Gates LBK_EN 0.000585 0.000568 1.031 770549
Koros_EN Iberia_EN LBK_EN -0.000175 0.000696 -0.251 647053
Koros_EN Iberia_EN Starcevo 0.000738 0.001072 0.689 496318
Koros_EN Cardial_EN LBK_EN 0.000021 0.000789 0.026 609865
Koros_EN Cardial_EN Starcevo -0.000584 0.001115 -0.524 470485



Lipson, M. et al. Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature 551, 368–372 (2017)

Mathieson, I. et al. The genomic history of Southeastern Europe. Nature 555, 197-208 (2018)


The first farmers, with a focus on Anatolia

Background on the Sites and Samples

We first learned about the demographics of the first farmers across the Levant and Zagros from Lazaridis et al. (2016). From that work, we saw that there were at least two independent developments towards agriculture, with two distinct populations (Lazaridis et al., 2016). One was located in the Levant (Natufians), while the other was at Ganj Dareh, in Iran (Lazaridis et al., 2016). We learned from the paper that Natufians and Ganj Dareh were both nearly equal in Basal Eurasian, yet Ganj Dareh had a lot of ancestry from a lineage related to ANE, and Natufians had little to none of this ancestry (Lazaridis et al., 2016).  The Levant Neolithic samples from PPNB to PPNC were a mix of something related to Natufians, and another lineage related to Anatolian farmers from Barcin and Mentese (Mathieson et al., 2015; Lazaridis et al., 2016).

Also in 2016, we caught our first glimpse of the Early Neolithic in Central Anatolia (Kilinc et al., 2016), on the Konya Plain. The Boncuklu group was actually a very early Neolithic group that did have small-scale agriculture (Baird et al., 2012), but was not “aceramic” as many believe. There were very early styles of pottery production and limited use of ceramics at this site; some of the earliest in West Asia (Fletcher et al., 2017). The Boncuklu farmers were a group with long runs of homozygosity, comparable to Western Hunter Gatherers (WHG) (Kilinc et al., 2016). Despite this, the Boncuklu samples are actually quite similar to the later farmers at Barcin, Mentese, and early European farmers (Kilinc et al., 2016), pointing to Anatolia as a potentially third place of independent farming production, along with the Levant and Zagros.

Broushaki et al. (2016) also brought us more samples from the Zagros, that clustered with those samples from Ganj Dareh. These samples were from Tepe Abdul Hosein and Wezmeh Cave (Broushaki et al., 2016).

When it came to actually looking at the ancestral breakdown of Anatolians, Lazaridis et al. (2016) came up with a very solid model where Anatolians were a mix of lineages related to Ganj Dareh, Levant Neolithic, and WHG, with mixture proportions of 0.387, 0.339, and 0.274, respectively.

For the purpose of this post, I am going to focus on Boncuklu and their role in the formation of the Anatolian Neolithic. Since Boncuklu capture method involved shotgun data, I am also using shotgun sequenced genomes from Broushaki et al. (2016) to use as my Iranian farmers (Tepe Abdul Hosein), and the Barcin farmers (Bar8, Bar31). Also, Kostenki14 and MA1 are being used as hunters, as these two form two poles of variation, and are important in the formation of WHG and EHG. Lineages related to both are also important to the formation of these farmers.

In the analysis of the Boncuklu and Barcin farmers, Kilinc et al. (2016) stated that diversity from mixing with other similar groups and/or admixture from more southern groups is possible in the transition from Boncuklu to the later Neolithic of Anatolia. Taking a look at various methods of analysis, it does seem quite possible that a more southern source did admix into the Anatolian farmers of Barcin Hoyuk.


First, looking at D-stats, there isn’t anything that is very significant, in determining if Barcin or Boncuklu share more with any particular population. There are near significant results for Natufians and less for the Levant Neolithic samples.

Outgroup Test Pop1 Pop2 f4 Z-score SNPs
Mbuti_DG Ust_Ishim Boncuklu Anatolia_N1 -0.000282 -0.628 1081252
Chimp Ust_Ishim Boncuklu Anatolia_N1 -0.000416 -0.876 1036168
Mbuti_DG Natufian Boncuklu Anatolia_N1 0.00148 2.878 494327
Chimp Natufian Boncuklu Anatolia_N1 0.001535 2.843 472808
Mbuti_DG Levant_N Boncuklu Anatolia_N1 0.000891 2.171 804093
Chimp Levant_N Boncuklu Anatolia_N1 0.000694 1.556 769663
Mbuti_DG Tepe_Abdul Boncuklu Anatolia_N1 -0.000556 -1.448 1003898
Chimp Tepe_Abdul Boncuklu Anatolia_N1 -0.000645 -1.559 962310
Mbuti_DG Wezmeh Boncuklu Anatolia_N1 -0.00025 -0.562 1082448
Chimp Wezmeh Boncuklu Anatolia_N1 -0.000415 -0.882 1037321
Mbuti_DG CHG Boncuklu Anatolia_N1 -0.000361 -0.867 1083101
Chimp CHG Boncuklu Anatolia_N1 -0.000445 -1.031 1037961
Mbuti_DG Iron_Gates Boncuklu Anatolia_N1 -0.000623 -1.69 1076193
Chimp Iron_Gates Boncuklu Anatolia_N1 -0.000712 -1.803 1031346
Mbuti_DG EHG Boncuklu Anatolia_N1 -0.000393 -0.989 995924
Chimp EHG Boncuklu Anatolia_N1 -0.000544 -1.281 954789
Mbuti_DG W_Siberia Boncuklu Anatolia_N1 0.000176 0.404 818559
Chimp W_Siberia Boncuklu Anatolia_N1 0.000181 0.397 783377
Mbuti_DG Ami Boncuklu Anatolia_N1 -0.001048 -1.936 734487
Chimp Ami Boncuklu Anatolia_N1 -0.001122 -1.945 702374


Also, looking at f3, there isn’t a lot to really support much in the way of admixture from Levant or Natufians into the Barcin Hoyuk samples.

Pop1 Pop2 Target f3 std error Z-score SNPs
Boncuklu Tepe_Abdul Anatolia_N1 0.004 0.002013 1.987 494742
Boncuklu Wezmeh Anatolia_N1 0.002506 0.002218 1.13 486870
Boncuklu Levant_N Anatolia_N1 -0.000089 0.001916 -0.047 382114
Boncuklu Tepecik Anatolia_N1 0.004217 0.00194 2.174 361064
Boncuklu Natufian Anatolia_N1 -0.001456 0.00235 -0.619 222845

qpAdm Models

Switching over to qpAdm, the first thing to check is what Boncuklu is best modeled as. For the right populations the simplest and seemingly best for using Kostenki, Natufians, and Tepe_Abdul_Hosein_N as the left pops turned out to be Mbuti_DG, Ust_Ishim, GoyetQ116-1, Vestonice16, MA1, Ami, Onge, and Karitiana.

 probability   Natufian  Tepe_Abdul  Kostenki14    std error    std error    std error
0.236 0.232 0.449 0.32 0.142 0.107 0.074

While the probability and standard errors are a little high, it is hard to keep them any lower. While adding something like Ganj_Dareh_N or Levant_N will reduce the standard error, the probability also dips significantly.

Moving onto Anatolia_N1 (Bar31, Bar8), the right pops, or outgroups became Mbuti_DG, Ust_Ishim, Kostenki14, MA1, Natufian, Tepe_Abdul_Hosein_N, Ami, Onge, Karitiana.

   probability    Boncuklu    Levant_N    std error    std error
0.815 0.687 0.313 0.064 0.064

Also, swapping Natufian and Levant_N between left and right populations:

probability Boncuklu Natufian std error std error
0.302 0.771 0.229 0.071 0.071

qpGraph Models

After seeing some decent fits in qpAdm, I tried to use qpGraph and see how easily they could be replicated. To being with, I started with a simple tree that contained just the Natufians and Tepe_Abdul_Hosein_N. The Interestingly, CHG comes out as a mix of Natufian, Tepe_Abdul_Hosein, and MA1. first tree should what would be expected, with Natufians and Tepe_Abdul_Hosein_N having similar amounts of Basal Eurasian, and with Iranian farmers having ancestry from a lineage related to MA1.

6/4/18 New Update: Simple qpGraphs

The purpose of these simple graphs is to avoid a lot of admixture edges and to show the steps in finding the ancestry of Boncuklu, Levant, and Barcin samples. Samples that are not absolutely needed, such as Ust_Ishim, MA1, CHG, and any ENA population will be left out. This also shows the work process in how I find where to look at for admixture events.

The first graph here shows that by branching Boncuklu off of Natufian, there is a significant Z-score with Iranian farmers. The next step will be to branch from Iranians to Boncuklu.




The next output had a worst Z-score that involved Kostenki, meaning that an admixture line from the Kostenki branch will be the next admixture source into Boncuklu.


After adding the admixture edge from near Kostenki, Boncuklu comes out mostly Kostenki-related, with the rest being a near even mix of Tepe Abdul Hosein and Natufian.


This created a pretty solid fit, so, the next step would be to see how the graph works with Adding Levant_N. Since the Levant samples harbor a lot of Natufian-related ancestry, that is where they will first be branched off of.


This output shows that an admixture edge is needed from a node on the Boncuklu line to Levant_N in order to drop the Z-score. A shared drift edge with Boncuklu was needed to move the worst Z-score closer to 2.


For the next run, I added the two shotgun samples from Barcin (Bar8, Bar31) to the tree.This left a worst Z > 3, so, the next step will be an edge from a node related to Natufians into Anatolia_N1.


For the last graph, the edge from a Natufian-related node admixes into the Barcin Hoyuk samples. There remains a worst Z > 2.5, that appears to be asking for Iranian admixture into the Levant_N population. While it isn’t necessary, I did look at adding an admixture edge from the Iranian farmers and it only added 2% admixture. This graph should be sufficient enough for a simple tree that incorporates all of the first farmers.


Below are more complex graphs, with an Eastern non-African source (Onge), Ust-Ishim and also MA-1.


For the next step, I also added CHG to the tree, just to see where it was placed and to add another admixture source in West Asia. As other stats show, CHG was less basal than Iranian farmers, but also shifted towards Natufians, potentially sharing a formation story with Boncuklu.


After achieving a good fit with CHG, I then went on to see just how Boncuklu fit into the picture. As expected, Boncuklu is best fit as a mix of Natufian, Iranian, and a lineage related to Kostenki14, with admixture also from CHG. Trying to do just Iranian or just CHG led to significantly higher Z-scores. Both were needed. It appears there is additional ANE in Boncuklu that is not accounted for in Iranians.


After applying a good fit for Boncuklu, I next wanted to see just how the farmers at Barcin Hoyuk fit into tree. For this, I first created a new simplified tree to use and also co-fit the Levant_N samples to see if the best fit includes additional southern ancestry into the later Northwest Anatolian Neolithic.


I also tried a run with just transversion sites, just to see how that would turn out.


Just as a precaution, I ran another test, with Levant_N receiving admixture from a separate branch related to Boncuklu, that is ancestral to Anatolia_N1, to see if that reduced or eliminates the need for admixture from Levant_N. As it turns out, Anatolia then asked first for admixture from Natufians, rather than Levant_N. So, the question then becomes, was there another population that was related to Levant_N which lacked or had less Iranian farmer ancestry and mixed into the ancestors of the Barcin and European farmers? More samples, specifically from southern and especially southeast Anatolia, and PPNA will provide the answers which are needed.For argument sake, I tried both the Natufian and Levant_N admixture edges to see how they turned out. Having more a higher f4 and f3 with Natufians, rather than Levant, leads me to believe this is possible.




The end result here has the Barcin samples as a mix of Boncuklu and Levant_N (or a more Natufian-like group), in agreement with qpAdm. However, this output is only about two-thirds as much Levantine flow into later Anatolian farmers as qpAdm showed. Transversion sites placed more ancestry related to Iranian farmers into both Boncuklu and Levant_N. Boncuklu became significantly more Iranian than the previous run. The amount of flow from Levant_N into Barcin held pretty steady. While more samples across time and space, with similar capture methods, will help this to be resolved. There aren’t much in the way of D-stats or f3_ratio to suggest there is a lot of gene flow there. qpAdm and qpGraph do provide a strong case of gene-flow into Northwest Anatolia.

In upcoming posts I will look at the formation of European farmers, hunter admixture into the Middle Neolithic, and also at the roots of Epipaleolithic and Mesolithic European hunters.



Baird D. The Late Epipaleolithic, Neolithic, and Chalcolithic of the Anatolian Plateau, 13,000–4000 BC. In: Potts D., editor. A Companion to the Archaeology of the Ancient Near East. Wiley-Blackwell; 2012. pp. 431–466.

Broushaki F., Thomas M.G., …, Berger J. Early Neolithic genomes from the eastern Fertile Crescent. Science. 2016; 353: pp. 499-503.

Fletcher A., Baird D., Spataro M. & Fairbairn A. Early ceramics in Anatolia: Implications for the production and use of the earliest pottery. The evidence from Boncuklu Hoyuk. Cambridge Archaeological Journal. 2017; 27 (2): pp. 351-369.

Kilinc G.M., Omrak A., …, Gotherstrom A. The demographic development of the first farmers in Anatolia. Current Biology. 2016;  26(19): pp. 2659–2666.

Lazaridis I., Nadel D., …, Reich D. Genomic insights into the origin on farming in the ancient Near East. Nature. 2016;536: pp. 419-424.

Mathieson I., Lazaridis I., Rohland N., Mallick S., Patterson N., Roodenberg S.A., Harney E., Stewardson K., Fernandes D., Novak M. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015;528:499–503. [PubMed]