Thread

SG
Shadrack Gichana9:00 AMOpen in Slack
Hey team, my name is Shadrack Gichana, I'm a full-stack developer from Nairobi. I've been going through the codebase today and I'd love to work on adding a Dropbox knowledge connector. I went through the existing connectors like Notion, SharePoint, GitHub and from my understanding the pattern is validateConfig, testConnection, incremental sync with a cursor-based checkpoint, frontend fields, and registry wiring. The Dropbox API v2 maps cleanly onto that same structure. I'll open a proposal issue on GitHub first if the team is open to it.
👋1

24 replies
J(
joey (archestra team)8:46 PMOpen in Slack
hi there Shadrack, welcome! 👋
A dropbox connector definitely makes a lot of sense, great idea. Feel free to create a GitHub issue and open a PR.
I would say that if you can add a demo video (and some tests) to that contribution it makes the core team’s review a bit easier 🙂
SG
Shadrack Gichana7:41 AMOpen in Slack
Thanks Joey! Appreciate the warm welcome and the guidance.
Will do on both. I'll include a short demo video showing the connector in action (config, test connection, and an incremental sync run) and add unit + integration tests following the patterns I see in the existing connectors.
I'll get the GitHub issue up shortly and tag you once the PR is ready for review. Looking forward to contributing! 🙌
SG
Shadrack Gichana8:41 PMOpen in Slack
Hey Joey i raised an issue about the dropbox connector which I wanted to finish up on then raise then as i submit the pr requeest, but I see someone hijacked the idea and opened an issue. I'm already almost done doing the code and the test have ran successfully. I'm finalizing can the idea be reserved for me. Or can I still submit my pr request?
J(
joey (archestra team)8:42 PMOpen in Slack
ah yes I see what you mean, yes no worries, you brought forward the idea first so I'll review your PR first, no stress 🙏
J(
joey (archestra team)8:43 PMOpen in Slack
(totally forgot about this thread when I saw that github issue)
SG
Shadrack Gichana9:28 PMOpen in Slack
Thanks Joey, I appreciate it. I'll finish up the final touches and submit the PR for your review.
:archestra-love:1
SG
Shadrack Gichana1:39 AMOpen in Slack
Hey Joey, PR is ready for review: https://github.com/archestra-ai/archestra/pull/3780
26 unit tests passing. Note that the full end-to-end sync demo requires the backend running with this branch, happy to provide that if needed.
Thank you.
👀1
J(
joey (archestra team)3:21 PMOpen in Slack
will take a look today, thanks! question on this. Will this support subfolder traversal?
If not in its current state, I think we should add it. Our Google Drive connector has logic for doing subfolder discovery/traversal - I think it would be worthwhile extracting that logic out from the google drive connector and making it generic enough such that it could be reused for the dropbox connector
(this would be super helpful as we could then reuse that same logic for sharepoint (see https://github.com/archestra-ai/archestra/issues/3665))
SG
Shadrack Gichana7:28 PMOpen in Slack
Hey Joey, just left a summary on the PR on github but wanted to flag here too. I appreciate the feedback, all your review comments are addressed: subfolder traversal with a shared BFS utility, batchSize chunking fix, fileTypes field in the frontend, and migrated to the official Dropbox SDK. CLA is signed. PR is ready for another look when you get a chance: https://github.com/archestra-ai/archestra/pull/3780
:archestra-love:1
SG
Shadrack Gichana7:31 PMOpen in Slack
Also just to answer the previous question. Yes, subfolder traversal is now fully supported. I extracted the BFS folder traversal logic from the Google Drive connector into a shared folder-traversal.ts utility (traverseFolders function with a FolderTraversalAdapter interface), and wired the Dropbox connector to use it. The traversal is folder-by-folder using filesListFolder with recursive: false per folder, which matches the Google Drive pattern exactly.
The same utility is ready for SharePoint to use to fix #3665 meaning any connector just needs to implement listDirectSubfolders and pass it to traverseFolders. Both recursive and maxDepth are configurable via the connector config.
🙌1
J(
joey (archestra team)8:19 PMOpen in Slack
last comment I would have would just be around platform/backend/src/knowledge-base/connectors/folder-traversal.ts - do you mind also refactoring backend/src/knowledge-base/connectors/gdrive/gdrive-connector.ts to use this shared traversal logic? this way that logic is in one spot 🙂
additionally, it is worth adding a note in docs/pages/platform-adding-knowledge-connectors.md to mention that when adding a new knowledge connector, if it needs to do recursive subfolder traversal, to prefer using platform/backend/src/knowledge-base/connectors/folder-traversal.ts
SG
Shadrack Gichana8:28 PMOpen in Slack
On it. I'll refactor the Google Drive connector to use traverseFolders and add the note to the docs. Will push shortly.🫡
:archestra-love:1
SG
Shadrack Gichana8:48 PMOpen in Slack
Hey Joey, just pushed. Google Drive connector now uses traverseFolders too, added 9 unit tests for the utility itself, and documented it in the adding connectors guide. All tests are passing.
👍1
J(
joey (archestra team)12:28 AMOpen in Slack
any chance you are able to re-upload/share your demo video? 🙂 I see this when trying to watch
SG
Shadrack Gichana6:59 AMOpen in Slack
Hey Joey, pushed fixes for both blockers -- recursive and maxDepth are now in the Dropbox schema, and the spy assertion in the traversal test is properly wired. Also regenerated the shared types with CODEGEN=true so the frontend picks up the dropbox connector type correctly. One heads up -- there's a pre-existing build error in sign-out-with-idp-logout.tsx on getIdentityProviderIdpLogoutUrl that's unrelated to this PR. 74 tests passing across the Dropbox connector, Google Drive, and folder traversal utility. Demo video: https://youtu.be/b05cE0FftAE -- had to switch from Loom as it was rejecting the file. Ready for another look.
👀1
SG
Shadrack Gichana1:41 PMOpen in Slack
Hello Joey, pushed the two fixes. Working on the end-to-end demo with Tilt right now -- will share the recording once the environment is up.
🙏1
SG
Shadrack Gichana6:36 PMOpen in Slack
Hello Joey I'm having trouble with api credits but i'll have sent the video in an hour. Apologies for any inconvenience caused.
J(
joey (archestra team)6:36 PMOpen in Slack
no worries 🙂
SG
Shadrack Gichana10:06 PMOpen in Slack
Hey Joey, pushed the cursor fix -- full sync now gets a root-scoped cursor after BFS traversal so incremental sync tracks the whole tree correctly. Here's the updated demo video: https://youtu.be/VK8H1iodAP8 -- shows the connector syncing files from nested Dropbox subfolders and the chat querying the knowledge base.
Though I wasn’t able to get inline citations cause it needs an OpenAI embedding key which I didn't have available -- the sync and retrieval pipeline is working as shown in the logs and in the video. 34 tests passing.
2 / 2
SG
Shadrack Gichana10:32 PMOpen in Slack
Also though I’m no longer able to comment directly on the pr cause of new changes “An owner of this repository has limited the ability to comment to users that have contributed to this repository in the past.”
J(
joey (archestra team)10:53 PMOpen in Slack
hey 👋 yes we made a small on-boarding change for contributors (we’ve had a massive influx of AI slop contributions), we’ll have a blog post on the topic soon but in the interim you just need to go here and follow the steps and you should be all set!
SG
Shadrack Gichana11:15 PMOpen in Slack
Thanks for that. I've gone through the guide and I've followed the instructions. Though on the demo video unfortunately I couldn't get inline citations working because I don't have an OpenAi embedding key set up yet. I'm using Llama locally. But the full pipeline from sync to chunk to retrieval is working as demonstrated. Let me know if the demo looks good or if there's anything you'd like me to adjust.
SG
Shadrack Gichana10:57 AMOpen in Slack
Hello Joey, just from analysing the code again. Just to clarify the cursor approach, after the BFS traversal completes, we call filesListFolder with recursive: true on the root path to get a single root-scoped cursor. That cursor is what gets saved in the checkpoint. All subsequent incremental syncs use filesListFolderContinue with that cursor, which means Dropbox tracks changes across the entire tree from that point forward and not just the last folder walked. The CI failures are pre-existing in main and unrelated to the Dropbox changes. Happy to walk through the cursor logic in more detail if that would help.
SG
Shadrack Gichana6:38 PMOpen in Slack
Hi Joey, glad the PR is merged! I truly appreciate the guidance and feedback throughout the review process. I have some ideas for potential connectors from my analysis of the codebase, will come better prepared next time. Looking forward to the next contribution!​​