I have a code like this. And I'm launching it. I get an ngrok link.
!pip install aiohttp pyngrok
import os
import asyncio
from aiohttp import ClientSession
# Set LD_LIBRARY_PATH so the system NVIDIA library becomes preferred
# over the built-in library. This is particularly important for
# Google Colab which installs older drivers
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})
async def run(cmd):
'''
run is a helper function to run subcommands asynchronously.
'''
print('>>> starting', *cmd)
p = await asyncio.subprocess.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
async def pipe(lines):
async for line in lines:
print(line.strip().decode('utf-8'))
await asyncio.gather(
pipe(p.stdout),
pipe(p.stderr),
)
await asyncio.gather(
run(['ollama', 'serve']),
run(['ngrok', 'http', '--log', 'stderr', '11434']),
)
Which I'm following, but the following is on the page
How can I fix this? Before that, I did the following
!choco install ngrok
!ngrok config add-authtoken -----
!curl https://ollama.ai/install.sh | sh
!command -v systemctl >/dev/null && sudo systemctl stop ollama
1. Run ollama but don't stop it
This means Ollama is running (but do check to see if there are errors, especially around graphics capability/Cuda as these may interfere.
However, Don't run
!command -v systemctl >/dev/null && sudo systemctl stop ollama
(unless you want to stop Ollama).The next step is to start the Ollama service, but since you are using
ngrok
I'm assuming you want to be able to run the LLM from other environments outside the Colab? If this isn't the case, then you don't really need ngrok, but since Colabs are tricky to get working nicely with async code and threads it's useful to use the Colab to e.g. run a powerful enough VM to play with larger models than (say) anthing you could run on your dev environment (if this is an issue).2. Set up ngrok and forward the local ollama service to a public URI
Ollama isn't yet running as a service but we can set up ngrok in advance of this:
Run that code so the functions exist, then in the next cell, start ngrok in a separate thread so it doesn't hang your colab - we'll use a queue so we can still share data between threads because we want to know what the ngrok public URL will be when it runs:
That will be running, but you need to get the results from the queue to see what ngrok returned, so then do:
This should output something like:
3. Run ollama as an async process
That creates the function to run an async command but doesn't run it yet.
This will start ollama in a separate thread so your Colab isn't blocked:
It should produce something like:
Now you're all set up. You can either do the next steps in the Colab, but it might be easier to run on your local machine if you normally dev there.
4. Run an ollama model remotely from your local dev environment
Assuming you have installed ollama on your local dev environment (say WSL2), I'm assuming it's linux anyway... but i.e. your laptop or desktop machine in front of you (as opposed to Colab).
Replace the actual URI below with whatever public URI ngrok reported above:
You can now run ollama and it will run on the remote in your Colab (so long as that's stays up and running).
e.g. run this on your local machine and it will look as if it's running locally but it's really running in your Colab and the results are being served to wherever you call this from (so long as the OLLAMA_HOST is set correctly and is a valid tunnel to your ollama service:
You can now interact with the model on the command line locally but the model runs on the Colab.
If you want to run larger models, like mixtral, then you need to be sure to connect your Colab to a Back end compute that's powerful enough (e.g. 48GB+ of RAM, so V100 GPU is minimum spec for this at the time of writing).
Note: If you have any issues with cuda or nvidia showing in the ouputs of any steps above, don't proceed until you fix them.
Hope that helps!
Gruff