[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/jp/ - Otaku Culture

Search:


View post   

>> No.46858977 [View]
File: 542 KB, 640x671, 1601526374583.png [View same] [iqdb] [saucenao] [google]
46858977

>>46858949

>> No.45921248 [View]
File: 542 KB, 640x671, 1601526374583.png [View same] [iqdb] [saucenao] [google]
45921248

>they should have bonus points for being a young woman if they care about their fertility rates

>> No.43698163 [View]
File: 542 KB, 640x671, spongebob_professional.png [View same] [iqdb] [saucenao] [google]
43698163

wtf is the difference between a クモの網 and a くもの巣? does くもの巣 refer to just anything the spider lives in (in most cases the web, but sometimes a burrow), while クモの網 is always the web? goo.ne.jp defines each as the other...

>> No.42012778 [View]
File: 542 KB, 640x671, 1601526374583.png [View same] [iqdb] [saucenao] [google]
42012778

>>42012442
>>42012453

>> No.37238874 [View]
File: 542 KB, 640x671, spongebob_professional.png [View same] [iqdb] [saucenao] [google]
37238874

>>37237047
I see a shit ton of claims like "you need way more words in japanese to get a similar level of comprehension!" but that doesn't seem to align with the analysis I've done on my own with Japanese Text Analysis Tool and Morphman. JTAT was fed over 5000 novels (~3.7 million words) while Morphman was only fed 150 light/novels (~109k words) (morphman is very ram inefficient so that's all my computer could handle). Here are the coverage percents at each 1k interval (hopefully the formatting stays):

parser | 1000 | 2000 | 3000 | 4000 | 5000 |
JTAT | 75.4 | 80.6 | 83.5 | 85.6 | 87.1 |
MM | 80.1 | 85.9 | 88.9 | 90.9 | 92.3 |

Without knowing exactly how the data was collected for that specific pic, it may not be 100% comparable but I think some rough approximations can be made. Both of my results indicate that you can get higher coverage per each 1k than that pic suggests. Morphman's data shows that it's inline with the other languages, while JTAT shows that Japanese still falls a little behind, but not a significant amount. It should be noted again that this data was collected only from novels, which are more likely to use flowery language than spoken and more casual texts. Make whatever conclusions you'd like, but I think that the take away is that the claim of lower coverage in japanese is over exaggerated.

>> No.34457114 [View]
File: 543 KB, 640x671, 1601526374583.png [View same] [iqdb] [saucenao] [google]
34457114

>>34457097
it's not the same

Navigation
View posts[+24][+48][+96]