Citat:
Panta_:
Ili, instaliraj
NLTK paket:
Code:
pip install nltk
Zatim:
Code (python):import csv
import nltk
with open('/putanja/do/tvoj_fajl.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
try:
fh = open('/putanja/do/tvoj_fajl.txt')
lines = fh.read().replace('\n', '')
lines = nltk.sent_tokenize(lines)
for line in lines:
writer.writerow([line])
except IOError as e:
print(f'OS error: {e}')
finally:
fh.close()
Panto, instalirao nltk 3.4.5
Ali ne pomaže mnogo.
Elem, u Notepad sam iskopirao tekst:
Noël a lieu le 25 décembre. C’est une fête chrétienne qui célèbre la naissance de Jésus. Les familles se réunissent et partagent un bon repas le soir du 24 décembre, et on s’offre des cadeaux. Enfin, Pâques n’a pas de date fixe, c’est un dimanche compris entre le 22 mars et le 25 avril.
i sačuvao ga kao original.txt sa utf-8 encoding. Onda sam mu promenio ekstenziju u .csv
I kod je trenutno ovaj:
Code:
import csv
import nltk
with open('c:\FAJLOVI\Python_School\CSV\original.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
try:
fh = open('c:\FAJLOVI\Python_School\CSV\original.csv')
lines = fh.read().replace('\n', '')
lines = nltk.sent_tokenize(lines)
for line in lines:
writer.writerow([line])
except IOError as e:
print(f'OS error: {e}')
finally:
fh.close()
Pokrenem, kad ono čitav mi roman IDLE ispisao i sve crveno, dakle - neće moći:
Code:
Traceback (most recent call last):
File "C:/FAJLOVI/Python_School/CSV/koverzija.py", line 9, in <module>
lines = nltk.sent_tokenize(lines)
File "C:\Users\ja_sa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\nltk\tokenize\__init__.py", line 105, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Users\ja_sa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\nltk\data.py", line 868, in load
opened_resource = _open(resource_url)
File "C:\Users\ja_sa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\nltk\data.py", line 993, in _open
return find(path_, path + ['']).open()
File "C:\Users\ja_sa\AppData\Local\Programs\Python\Python37-32\lib\site-packages\nltk\data.py", line 701, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mtokenizers/punkt/english.pickle[0m
Searched in:
- 'C:\\Users\\ja_sa/nltk_data'
- 'C:\\Users\\ja_sa\\AppData\\Local\\Programs\\Python\\Python37-32\\nltk_data'
- 'C:\\Users\\ja_sa\\AppData\\Local\\Programs\\Python\\Python37-32\\share\\nltk_data'
- 'C:\\Users\\ja_sa\\AppData\\Local\\Programs\\Python\\Python37-32\\lib\\nltk_data'
- 'C:\\Users\\ja_sa\\AppData\\Roaming\\nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- ''
**********************************************************************