Wikipedia talk:Naming conventions (languages)
![]() | The contentious topics procedure applies to this page. This page relates to the English Wikipedia Manual of Style and article titles policy, a contentious topic. Editors are advised to familiarise themselves with the contentious topics procedures before editing this page. Editors who repeatedly or seriously fail to adhere to the purpose of Wikipedia, any expected standards of behaviour, or any normal editorial process may be blocked or restricted by an administrator. |
![]() | This project page does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||
|
| |||||
This page has archives. Sections older than 14 days may be automatically archived by Lowercase sigmabot III when more than 3 sections are present. |
Disambiguation by language family
[edit]When there are two or more languages with the same name, they would typically be titled something like XXX language (YYY), where YYY is either the country/region, or the language family. Country names appear to be more common, but there are a few dozen articles using the language family. Most of those that use family disambiguation have just been boldly moved to XXX (YYY language) [1]. Anyone think this may actually be better? If not, I'm planning to have the moves reverted. – Uanfala (talk) 14:57, 20 March 2021 (UTC)
- IMO, there is a clear hierarchy: first comes the natural disambiguation (Foo could be a language, a toponym, a demonym etc.), so we have Foo language. If there happens to be another language with the same name, then another disambiguator is required which comes after Foo language: Foo language + (country/region) or less ideally "Foo language (Bar)" where "Bar" is the language family. I support to revert all moves per BRD. –Austronesier (talk) 21:18, 24 March 2021 (UTC)
- First, we should always have the title be XXX language as a matter of consistency per WP:AT, so the recent moves should be undone for that reason alone. I agree with Austronesier that country and region are better disambiguators than the language family. Readers are likely to know where the language is spoken, while the language family is only useful to specialists. — Wug·a·po·des 22:54, 24 March 2021 (UTC)
- There may be times we want to dab by 'family' instead of location. E.g. two languages in New Guinea, one on the Indonesian side and one in PNG, may be better dab'd as AN and Papuan. — kwami (talk) 01:25, 27 March 2021 (UTC)
- I've reverted most of those edits. Two tricky ones remain though:
- Pyu language (Sino-Tibetan) was historically spoken in what is nowadays Myanmar, and the article until recently used a country disambiguator. Do we go back to Pyu language (Burma), Pyu language (Myanmar), or do we try to eschew possibly anachronistic geographic disambiguators and stick to the language family?
- Taman (Sino-Tibetan language) is similarly confined to Myanmar, but given it went extinct in the 1930s, it probably won't be anachronistic to use Taman language (Burma) (assuming that's prefereable to Taman language (Myanmar))? – Uanfala (talk) 15:11, 27 March 2021 (UTC)
- I suggest Pyu language (Sino-Tibetan) (anything else would be very anachronistic) and Taman language (Myanmar) (which isn't anachronistic since the descendants of Taman speakers still live in present-day /ˈmiː.ənmɑːr/, /miˈænmɑːr/, /ˌmaɪ.ənˈmɑːr/, /maɪˈɑːnmɑːr/, /ˈmaɪ.ænmɑːr/, /ˌmjænˈmɑːr/, /ˈmjænmɑːr/, /ˌmjɑːnˈmɑːr/, /ˈmjɑːnmɑːr/). –Austronesier (talk) 15:23, 27 March 2021 (UTC)
- And we should also resist tagging all Tibeto-Burman languages as "Sino-Tibetan". -- Kautilya3 (talk) 15:28, 27 March 2021 (UTC)
- Why? Sino-Tibetan is an established grouping in a way that Tibeto-Burman isn't. – Uanfala (talk) 19:22, 27 March 2021 (UTC)
- And we should also resist tagging all Tibeto-Burman languages as "Sino-Tibetan". -- Kautilya3 (talk) 15:28, 27 March 2021 (UTC)
- I suggest Pyu language (Sino-Tibetan) (anything else would be very anachronistic) and Taman language (Myanmar) (which isn't anachronistic since the descendants of Taman speakers still live in present-day /ˈmiː.ənmɑːr/, /miˈænmɑːr/, /ˌmaɪ.ənˈmɑːr/, /maɪˈɑːnmɑːr/, /ˈmaɪ.ænmɑːr/, /ˌmjænˈmɑːr/, /ˈmjænmɑːr/, /ˌmjɑːnˈmɑːr/, /ˈmjɑːnmɑːr/). –Austronesier (talk) 15:23, 27 March 2021 (UTC)
I don't think anachronism is a problem. Whether we call the country 'Burma' or 'Myanmar' is a political decision. I prefer 'Burma' myself, but it's splitting hairs to say we can't use 'Myanmar' because that wasn't in use in English at the time. Doesn't matter: it's the same country, with no possibility of confusion. — kwami (talk) 06:13, 2 April 2021 (UTC)
- I recommend keeping the status quo with the XXX language (YYY) convention, for the reasons mentioned above. Lingnanhua (talk) 00:08, 10 April 2021 (UTC)
Proto-languages
[edit]Why are proto-languages almost exclusively found under "Proto-X language", or even deliberately moved there with reference to this guideline (e. g. Proto-Athabaskan language)? In virtually all cases, they should be the primary topic for the term "Proto-X" quite unambiguously. --Florian Blaschke (talk) 14:46, 23 January 2024 (UTC)
Changes to WP:DIALECT
[edit]@Mathglot: Hello! I've made changes to this section to reflect the current practice on the project. The key points are:
- Serbo-Croatian is no longer entitled with a disambiguator "language".
- Malaysian Malay is no longer entitled Malaysian language, so it is not a representative example for that paragraph.
- Several minor changes were made to clarify the rule with more scholarly terms ("other criteria" changed to "linguistic criteria", added "as a disambiguator", and "varieties which have standard forms" changed to "standardized varieties")
Can you please elaborate on why the previous version is, in your opinion, "better"? Thank you in advance! – Aca (talk) 08:20, 4 June 2025 (UTC)
- Hi, Aca. For the record, we are talking about this edit. The reason I reverted was I found that the wording changes *other* than the part about Malaysian not being a good example not an improvement. However, I've edited again to remove Malaysian, as you prefer. I don't agree with the change of other criteria to linguistic criteria because it isn't always about linguistic issues, as in the case of Malaysian where it was more political, regardless whether we include that example or not. Finally, it seems to me the Malaysian example is still illustrative of that sentence, as long as we mention Malay language first and Malaysian Malay and Indonesian language second as the varieties.. For the expressions standard form or standardized variety, I consider them equivalent and have no preference; if there is a good reason to change it, feel free.
- That said, I find that whole sentence more confusing than helpful, and I am not even sure it is true. Maybe we should just get rid of it entirely, or, if it is attempting to convey something useful to the reader (what?) then recast it entirely so it is clear and accurate. First of all, not all cases of standard forms/standardized varieties use the word language in the title here, e.g., Gulf Arabic vs. Kuwaiti Arabic, Bahrani Arabic, and Qatari Arabic, or as you pointed out, Serbo-Croatian. Just removing examples which do not fit the assertion, does not make the assertion true. And is Hindustani language an example of whatever this sentence is trying to say, or not? The relevant varieties are called Hindi and Urdu, not Hindi language or Urdu language (which are, however, redirects). Then there's the politics or nationalistic reasons which often (maybe mostly?) seem to affect naming as much as linguistic criteria as we have seen in the Malaysian example. Moldovan language is a politico-nationalist example, which is back to being called Romanian language as of 2023, due to rumblings in Transnistria and not linguistic reasons. Then there's Bokmål and Nynorsk, are they varieties of Norwegian language (as our article states) or is Bokmål really just Danish language, and Nynorsk something different from either of them?
- If after recasting, that sentence ends up saying something like, "We sometimes append language to an article about a language that have other, similar varieties, as with Danish language or Indonesian language and sometimes don't, as with Gulf Arabic or Bokmål", then I don't know what we've told them that is useful. Unless we can come up with something that is helpful in some way, maybe we should just get rid of it. Mathglot (talk) 17:51, 4 June 2025 (UTC)
- My 'maybe mostly?' parenthetical conjecture above may have been too timid. The more I think about it, the more I think that it is *always* based on political, cultural, ethnic, or nationalistic reasons, and never on linguistic reasons. The examples so far always have a country name in the title. (I can think of other examples that involve a region name, like Galician language, but the same argument obtains.) Do we have even a single example where the reason for the naming is linguistic, and does not involve the name of a country or region? I cannot think of one. Mathglot (talk) 18:16, 4 June 2025 (UTC)
- @Mathglot: Thank you for this valuable comment! I've already linked the edit above, sorry if it wasn't visible enough. I changed to "standardized variety" to be more precise here (see standard language). I agree that the rule was not written clearly. The term "language", per this rule, should be appended when we talk about standardized varieties and need to disambiguate. That's why Serbian language is not called Serbian variety, although it is a variety with solid linguistic (not "political" or "other") grounds (that's why I used "linguistic"). It is not called just Serbian as this is ambiguous. On the other hand, Bokmål and Nynorsk are not ambiguous, so we don't append "language". This is why I wrote that the term "language" is "commonly used as a disambiguator". We don't always use it, but only when it is needed and when we talk about varieties (per COMMONNAME). Hopefully this explains my edits and helps you to understand the essence of the rule. I'm not in favor of removing it as it documents why varieties are called languages, but I'm in favor of clarifying it. – Aca (talk) 19:59, 4 June 2025 (UTC)
- No worries on the link, I just overlooked it; my bad. I think this is the root of the problem:
The term "language", per this rule, should be appended when we talk about standardized varieties and need to disambiguate.
- Maybe that is what we are doing now, but that really is an inaccurate approach, precisely due to linguistic grounds, namely: the difference between a language and a standardized variety. What you wrote may well describe what we are doing now, at least in the case of Serbian language, and leads to your next comment:
That's why Serbian language is not called Serbian variety, although it is a variety with solid linguistic (not "political" or "other") grounds
- I totally agree with you that that's why it is not called Serbian variety, but as you point out in the second clause, that is precisely what it is. I.e., there are no linguistic grounds (okay, very weak ones, around script, and so on) for calling it a language, and plenty of linguistic grounds for calling it a variety. So, what are we doing, calling it a language?
- The problem is in a way confined to Wikipedia (and not sources) because of our need (as you said) to keep a disambig page located at Serbian (or at Danish, or Romanian, etc.). Whereas the rest of the world just uses a signifier like Serbian, Danish, or Romanian in context and everybody knows what they are talking about, without having to decide every single time whether the referent is a language, a variety or something else, we have to title it something, and disambiguating term chosen (at least for these three) is language. But that's actually not sustainable linguistically speaking, imho.
- It could be we have bumped into an issue that is much bigger than we thought, and would require a wider discussion with more participants, and could conceivably result in an Rfc with an outcome requiring moving hundreds of language-related articles at Wikipedia. That would be daunting, but if that turns out to be the best outcome and consistent with our policies, then so be it, even if it means a ton of work over the horizon. The question of WP:COMMONNAME would doubtless come into it, and an interesting wrinkle would be weighing reliable, non-expert sources that use things like language, dialect, and variety loosely, vs. reliable linguistic sources that are more precise and avoid those terms unless they are accurate, and I have no idea how that might shake out.
- Back to the more limited context of dealing with that one sentence: yes to clarifying it, and I am not opposed to using the term lannguage in the sentence in order to remain consistent with actual practice at Wikipedia for the time being, even if that practice is wrong (doesn't match the most reliable sources) and might require change in the long run. But I believe we will eventually have to deal with the larger issue. At least, that's how it seems to me at this point; what about you? Mathglot (talk) 21:54, 4 June 2025 (UTC)
- Couldn't help thinking about the bigger issue, and how an Rfc might go. There is no way people will accept Serbian variety as the title of that article. I think what would be the best in the long run, is that the language/variety article should be entitled Serbian, consonant with widespread usage in expert and non-expert sources, and the disambig page at that location should be renamed Serbian (disambiguation). That would still leave a lot of work on the horizon, but a bot could help. Mathglot (talk) 21:59, 4 June 2025 (UTC)
- @Mathglot: I totally agree that we should seek wider input to discuss the future of this rule. This is a question of recognizability vs. factuality, which is oftentimes a good debate topic. Wikipedia commonly leans toward recognizability (e.g., Caspian Sea instead of Caspian Lake). However, this is not always the case, and it is up to the community to decide. For now, I made this edit. Is it clearer now? Is there anything else to highlight? After we sort this out, feel free to start the RfC. – Aca (talk) 10:21, 5 June 2025 (UTC)
- Hi, Aca, yes, I think that works, at least for now. I am both oversubscribed, and also about to be on reduced availability at Wikipedia until July, so not a good time for me to start an Rfc. Also, per WP:RFCBEFORE, before we go that route, there should be more and broader discussion here first. I presume you are familiar with WP:WikiProjects; if you are willing, and interested, please check out WP:APPNOTE and consider inviting editors at relevant projects to this discussion. If you do, I will try to contribute as and when I can, but the more important thing, imho, is to broaden the discussion and solicit additional opinions. Thanks, Mathglot (talk) 07:52, 8 June 2025 (UTC)
- Thank you for these remarks! According to that, I just notified WikiProjects Languages and Linguistics, Village Pump, and a major contributor to this specific convention. – Aca (talk) 09:11, 8 June 2025 (UTC)
- Hi, Aca, yes, I think that works, at least for now. I am both oversubscribed, and also about to be on reduced availability at Wikipedia until July, so not a good time for me to start an Rfc. Also, per WP:RFCBEFORE, before we go that route, there should be more and broader discussion here first. I presume you are familiar with WP:WikiProjects; if you are willing, and interested, please check out WP:APPNOTE and consider inviting editors at relevant projects to this discussion. If you do, I will try to contribute as and when I can, but the more important thing, imho, is to broaden the discussion and solicit additional opinions. Thanks, Mathglot (talk) 07:52, 8 June 2025 (UTC)
- Just a thought – would Serbian (language variety) or Serbian (variety) be a title option here? I think it may be hard for us to argue that the language variety is the primary topic over all the other topics listed at the DAB page Serbian, simply to resolve a minor linguistic quibble. Toadspike [Talk] 09:17, 8 June 2025 (UTC)
- @Toadspike: That could be options, but I think the former is clearer. I can also think of the prefix "Standard" as a solution so that we would have Standard Serbian, Standard Indonesian, etc. I don't know what others think about this. – Aca (talk) 09:58, 8 June 2025 (UTC)
- No, it would not be an option, since it is nothing but making up terms for languages that have common names and language codes.
- The Serbian language is called Serbian language not Serbian variety or anything else.
- These languages haven't been standardised until the 1800s and developed totally independent from eachother. Marulić used the term Croatian language in the 1500s, while many, many, many years after him some even came up with terms such as "Illyrian". Koreanovsky (talk) 12:49, 8 June 2025 (UTC)
- Well, Toadspike and I didn't talk about Serbian variety as a title, but rather we brainstormed titles like Serbian (language variety) and Serbian (variety). Since we discussed parenthetical disambiguation, framing this as "making up terms" isn't valid or constructive, and I don't see a reason for the strong tone here. Furthermore, for article titles, we use available English-language sources, so Marulić's reference isn't relevant per se. Also, "Serbian language" might not be the COMMONNAME as many organizations use just "Serbian", and the same goes for "Croatian". – Aca (talk) 14:05, 8 June 2025 (UTC)
- @Toadspike: That could be options, but I think the former is clearer. I can also think of the prefix "Standard" as a solution so that we would have Standard Serbian, Standard Indonesian, etc. I don't know what others think about this. – Aca (talk) 09:58, 8 June 2025 (UTC)
- @Mathglot: I totally agree that we should seek wider input to discuss the future of this rule. This is a question of recognizability vs. factuality, which is oftentimes a good debate topic. Wikipedia commonly leans toward recognizability (e.g., Caspian Sea instead of Caspian Lake). However, this is not always the case, and it is up to the community to decide. For now, I made this edit. Is it clearer now? Is there anything else to highlight? After we sort this out, feel free to start the RfC. – Aca (talk) 10:21, 5 June 2025 (UTC)
- we speak of the american language without meaning that it's not english. i don't think it's a problem to speak of the national language of a country as a 'language', as long as we don't misrepresent it in the article. mostly i think we just need to be cautious about abusing the word 'dialect'. there are several non-linguistic and IMO unencyclopedic uses of the word -- the languages of 'primitive' people that don't count as real languages [much as their 'fetishes' don't count as real religions], or distinct languages that are claimed by a socially dominant language or people as having an illegitimate identity. on the other hand, we have speakers of dialects who insist that they speak distinct languages as a way to promote ethnic identity. i don't think we can get away from the fiction that croatian, hindi and indonesian are 'languages', but IMO the main issue is to not let that become a slippery slope, where varieties are promoted or demoted on the basis of supposed legitimacy rather than on the basis of what someone learning them would experience. — kwami (talk) 09:49, 8 June 2025 (UTC)
- Using "language", while I understand the linguistic questions, is likely a clearcut application of WP:POVNAME. These various considerations about classification and the definition of "language" should be made clearly, as applicable to each individual article, in the lead and relevant parts of the body. That is not to say that other titles might not work (eg. the move of Malaysian language to Malaysian Malay), but that linguistic question by themselves are likely not a huge problem for that title formulation. CMD (talk) 10:01, 8 June 2025 (UTC)
- Couldn't help thinking about the bigger issue, and how an Rfc might go. There is no way people will accept Serbian variety as the title of that article. I think what would be the best in the long run, is that the language/variety article should be entitled Serbian, consonant with widespread usage in expert and non-expert sources, and the disambig page at that location should be renamed Serbian (disambiguation). That would still leave a lot of work on the horizon, but a bot could help. Mathglot (talk) 21:59, 4 June 2025 (UTC)
- No worries on the link, I just overlooked it; my bad. I think this is the root of the problem:
- @Mathglot: Thank you for this valuable comment! I've already linked the edit above, sorry if it wasn't visible enough. I changed to "standardized variety" to be more precise here (see standard language). I agree that the rule was not written clearly. The term "language", per this rule, should be appended when we talk about standardized varieties and need to disambiguate. That's why Serbian language is not called Serbian variety, although it is a variety with solid linguistic (not "political" or "other") grounds (that's why I used "linguistic"). It is not called just Serbian as this is ambiguous. On the other hand, Bokmål and Nynorsk are not ambiguous, so we don't append "language". This is why I wrote that the term "language" is "commonly used as a disambiguator". We don't always use it, but only when it is needed and when we talk about varieties (per COMMONNAME). Hopefully this explains my edits and helps you to understand the essence of the rule. I'm not in favor of removing it as it documents why varieties are called languages, but I'm in favor of clarifying it. – Aca (talk) 19:59, 4 June 2025 (UTC)
- My 'maybe mostly?' parenthetical conjecture above may have been too timid. The more I think about it, the more I think that it is *always* based on political, cultural, ethnic, or nationalistic reasons, and never on linguistic reasons. The examples so far always have a country name in the title. (I can think of other examples that involve a region name, like Galician language, but the same argument obtains.) Do we have even a single example where the reason for the naming is linguistic, and does not involve the name of a country or region? I cannot think of one. Mathglot (talk) 18:16, 4 June 2025 (UTC)
- I think the old adage goes "a language is just a dialect with an army and a navy". There aren't really any rigorous linguistic criteria to define what is a "language" vs other terms. Mutual intelligibility is a sliding continuum, and where on the continuum you cross over into a new "language" varies from context to context and depends heavily on political, cultural, and ethnic factors. So I don't know that we can do much better than WP:COMMONNAME here. Certainly I don't support using wording like Serbian (language variety) if the available sources don't use that term. -- LWG talk 14:49, 8 June 2025 (UTC)
- Agree. Even in reliable sources, the distinction between language and dialect may depend on where on a lumper-splitter continuum the authors lie, as well as other factors. Even though we have to introduce fine distinctions for article names, we should keep as close as possible to what reliable sources call the language (generic sense) spoken by a given group of people/in a given locality/territory. Donald Albury 15:12, 8 June 2025 (UTC)
- serbian and croatian are the same dialect as well, which complicates things. this isn't like macedonian and bulgarian, which are based on dialects chosen purposefully to be as distinct as possible. serbian and croatian were standardized to be as close as possible, both based on the eastern herzegovinian dialect of shtokavian. the old literary standard of croatian had been based on chakavian. similarly, malaysian and indonesian are both based on the johor dialect of malay, and modern standard hindi and urdu both on the delhi dialect of hindustani, all with the aim of being mutually intelligible. theses aren't marginal cases - most of the time the speakers themselves can't tell the difference. so the question is how do we treat distinct standardizations of the same language. personally, i think 'language' is fine, but IMO we shouldn't use that as a precedent to call every dialect a language, or vice versa. for marginal cases we rely on best sources. people have brought up danish as an example, but at least the scandinavian languages are distinct enough that people need subtitles. — kwami (talk) 16:33, 8 June 2025 (UTC)
- Agree. Even in reliable sources, the distinction between language and dialect may depend on where on a lumper-splitter continuum the authors lie, as well as other factors. Even though we have to introduce fine distinctions for article names, we should keep as close as possible to what reliable sources call the language (generic sense) spoken by a given group of people/in a given locality/territory. Donald Albury 15:12, 8 June 2025 (UTC)