(configurable) max filesize implemented
Remote media download just works (tm) too, very easy to get the remote homeserver base url properly with https://www.npmjs.com/package/@modular-matrix/autodiscover-client-configuration
now working on access token validation for upload, which will need some work on the <local> server bit too, accessing the synapse db for things.
But this standin works, much secure such wow :P
- proper database access token validation
- saving uploaded files to <local> server disk
added progress explanation so far to readme as well
did a bunch of code cleanup and refactoring, tomorrow hoping to get started on thumbnailing, and url previews after that, which would finish the spec compliance for the media repo :)
nice nice nice file uploads work very well now, properly stored where Synapse would normally expect them too, and a listing in the database
now to limit the in-memory cache on <remote> and check up the federation spec, and then /upload and /download are fully implemented
I love how the server-server spec for the media repo is "yeah just call their client-server endpoint" which is also how i implemented it already
ah fuck i reallly stumbled into a bad wormhole doing punycode domain testing, seems neither Synapse, Conduit nor matrix-media-repo can fetch media from those properly (and I can :3)
when the punycode is in the url literally like mxc://xn--puny--59d2hgc.dev.cthu.lu/testmedia, Synapse still fails, but Conduit and matrix-media-repo do fetch it correctly.
None of them do url decoding of the mxc right however..
time to write a nice memory cache
nevermind i can't be bothered anymore, I'll just make it remove the oldest entry...
the most basic of caches, it's a Map with an array tracking access order, removing the oldest accessed item from the map when it's about to get bigger than maxEntries
hmm debug() logging says trans rights? and the Validator says you're valid too!
:O I think I implemented the whole https://spec.matrix.org/unstable/server-server-api/#server-discovery Matrix server discovery flow!!! It's rather complex, with .well-knowns and SRV records and combinations of those.
Also nice that I could fork of the client-spec counterpart already made by someone else :) https://www.npmjs.com/package/@modular-matrix/autodiscover-client-configuration
now just have to wait for some responses from a few homeservers asking if I could add their servernames to my example.js, to show off the different flows and the method response you get
swapping out client discovery for server discovery in synapse-media-proxy makes it so I actually follow spec correctly there :)
some basic memory usage reporting, but memory management is an enigma so can't really see immediate free-ing when removing stuff from cache etc
submitted 2 things to TWIM https://matrix.org/blog/category/this-week-in-matrix for the first time in aaaages :3
soo good to just come across a library that does what you need to *perfectly*. I was messing about with regexes to parse Content-Disposition stuff, and with this library I can do both the parsing and formatting sooo much nicer (and it's used by express.js so it's Good(tm))
so now I show all filenames correctly :3
I have a nice ttl invalidating cache for server lookups, and the content-disposition lib is fully integrated
I also used vscode's incredible git integration to split those two changes into 2 commits after I had written both, with the suuper good visual cherry-picking of lines to commit
think I'll set up a test synapse-media-proxy soon(tm) but I'd accompany it with a testing synapse instance too, think NixOS should make it real easy to get that part up and running quick, and then I can get some real-world speedtests by just throwing test media links around :P
Monday though i suppose... i should really learn at least a bit for that fuckin midterm first
best thing about synapse-media-proxy development was looking a lot at fokshat.jpg in full-res tbh (and some other test images)
big refactor in preparation of thumbnailing support https://git.pixie.town/f0x/synapse-media-proxy/commit/7f2cb50b85ed30fbf6939087559ee5fab479e979
uhh, uhh I think I just fully implemented the thumbnailing in synapse-media-proxy??!?
I ❤️ well made npm libraries, `sharp` accepts both buffers and streams (directly from a remote media proxy), and JUST WORKS
And now you just get a proper error when trying to thumbnail an unsupported file (like a .txt lol), instead of crashing the server with an uncaught error :")
also lol I should fix that useragent, it's supposed to take the version from the package.json
/_matrix/media/r0/download/im_a/teapot now returns a picture of the Utah Teapot, with http status 418
url previews will be fun since I can specialcase a few types of urls (like youtube) that give totally unusable results currently (just a "Before you continue" instead of the title)
got started on the test deployment, great to do so with NixOS.
Already discovered and fixed some bugs but now turns out Synapse still won't serve my injected media so that needs more investigation tomorrow :/
aaaaa I got an absolute superthought under the shower on how to speed up concurrent access of non-cached media but I have a fucking meeting first before I can implement it aaaaa
currently an upstream request stream gets piped to the first requestor, and to a buffer for the later cache, but instead I should store a reference to the stream immediately so it can be piped to new requestor immediately as well, while it's still in progress!
ok subscribing to streams when they come available works, subscribing to an already existing stream doesn't because some of the data will already be read-out from it (and thus removed).
And seems having multiple subscribers to the same stream isn't ideal either as varying network speeds/stream consumption would give a similar issue, hmmm
in short: shit
I think I can do a cool stream splitting thing with late-joins but it'll be a bit more complex (and I have a (short) meeting in 20 mins..)
I guess this is the second yakshaving time where I really dive deep into the internals of a Node subsystem (last time it was the module system, resulting in https://www.npmjs.com/package/@require-transpile/core)
I did the proper thing and looked at existing implementations! and there's a module to split a stream to multiple consumers (nice), but nothing that keeps a buffer to backfill late-joiners. This will integrate *perfectly* with my current architecture because I'm already saving the whole stream into a buffer anyways (for later cache serves)
- first request comes in, upstream starts streaming to the first client
- second client requests that file while it's still streaming, it gets a new stream with the buffer up till now + then the new data
- upstream request finishes
- new clients get the whole cached buffer
love it when a plan comes together
this sounds dangerously like I know what I'm doing, we'll see if my coding proves that wrong
good news: I did not really know what I was doing!
but now it is done, another biiiiig refactor commit with the new streams architecture https://git.pixie.town/f0x/synapse-media-proxy/commit/091e9dc346a23abdab2a4a660857fee30530c4df
next I do probably want to add some disk caching too so it's not all memory based
and prometheus metrics
and url previews ofc
synapse-media-proxy serving files well :3
backed by an actual Synapse here, running on my NixOS new homeserver
hope I have time to implement metrics soon and then I'll upload an image to some busy Matrix room and see it fetched by a billion other homeservers
ah yes and this classic video https://media.pixie.town/_matrix/media/r0/download/media.pixie.town/pHAuiyxqRyQE80iciHfhDwMx
lol you can definitely see when I started testing things (aura is the <remote> component, cosmos the <local> server at home)
servers with literally just constant prometheus traffic have such pleasing straight network graphs
submitted another TWIM with the synapse-media-proxy updates, icymi:
- fancy dashboard in progress! https://stats.pixie.town/d/rPBvoh6Gk/synapse-media-proxy?orgId=1&from=now-30m&to=now
- teapots! https://media.pixie.town/_matrix/media/r0/download/im_a/teapot
the stats dashboard is interesting, there's a lot of people clicking the media link (or browsers prefetching it?) from the This Week In Matrix article i presume
And here is This Week In #Matrix ft. me again :P
or i guess scrolling through the TWIM room backlog and their servers fetching it from there, maybe i should collect requesting hs names if that's in some header
so, i implemented most of url previewing yesterday :3
ranty about Synapse
and fucking hell knowing how easy that was, I'm so fucking dissapointed in how absolutely terrible Synapse's previews are. Even though the API results are named after OpenGraph tags THEY DONT ACTUALLY USE OPENGRAPH but instead do some actually wack parsing so you get 0 usable info out of tons of sites, whereas they serve you all you have to know on a fucking platter in their opengraph tags....
@f0x Your not sending caching headers or is that intentional?
@erikk haven't added those yet, no dunno how useful those even are when remote users (majority) will fetch it through their own media repo which caches it indefinitely anyways
@erikk synapse sends cache-control public,max-age=86400,s-maxage=86400 (1 day) I guess I could just copy that
@f0x it downloads basically instantly.... are you a wizard?
@dumpsterqueer just having it served directly from the cheap hetzner box has soo much better internet than my home :D
and it's even cached in RAM there :)
@f0x aha :P that'll do it!
@f0x I don't know what any of this means but it looks cool
@anarchiv it's a complement to my Matrix server, which is hosted at home through not so great internet.
This alleviates a lot of the slowness by taking the spikes from image/video download on a second, much smaller server which has better internet
@f0x God you're cool
one day I'll be in there!
@f0x oh wow, that is a big fucking difference!
Smol server part of the pixie.town infrastructure. Registration is approval-based, and will probably only accept people I know elsewhere or with good motivation.