### Goals
- [x] #NitroDigest read and answer in discussion about OOP: [https://github.com/Frodigo/garage/discussions/64]
- [x] #NitroDigest research about chunked prompts [https://github.com/Frodigo/garage/issues/97]
- [x] #Garage add info about NitroDigest project to the [[Now]] page
### Notes
- Worked on [this issue](https://github.com/Frodigo/garage/issues/97) and tried to create a working prototype with Cursor, but it was not able to handle this #AIwillNotReplaceMeTooFast
- it forces using tiktoken to count tokens in texts but it is not a good option for Llama models
- boilerplate code was ok, but chunks were not created correctly
- it tries stupid solution like: adding "safety margin with 1500 tokens to be sure that chunk size will be ok"
- next steps and open questions:
- is there any library that can help with counting tokens for Llama models?
- implement code responsible for create chunks.
- Added diagrams to the issue
### Challenges & solutions
- #NitroDigest prompt that is sent to Ollama, often is to long and Ollama truncates it which has an impact on summary quality
- solution: split text into chunks
- problem 1:
- how to count tokens in text?
- tiktoken
- but it's designed for OpenAI models, not for Olama
- solution: for now I added simple code that count tokens based on model configuration.
- problem 2:
- how to split text into sentences?
- why?
- I need to create chunks and do not split text in the middle of word
- Chunk needs to make sense
- solution: NLTK
### Useful snippets & resources
- [https://github.com/openai/tiktoken](https://github.com/openai/tiktoken)
- NLTK:
- [https://github.com/nltk/nltk](https://github.com/nltk/nltk)
- [https://www.nltk.org](https://www.nltk.org)
Continuation: [[6. 2025-04-23 - split text into chunks part 2]]