Hey team, my name is Shadrack Gichana, I'm a full-stack developer from Nairobi. I've been going through the codebase today and I'd love to work on adding a Dropbox knowledge connector. I went through the existing connectors like Notion, SharePoint, GitHub and from my understanding the pattern is `validateConfig`, `testConnection`, incremental sync with a cursor-based checkpoint, frontend fields, and registry wiring. The Dropbox API v2 maps cleanly onto that same structure. I'll open a proposal issue on GitHub first if the team is open to it.
hi there Shadrack, welcome! 👋
A dropbox connector definitely makes a lot of sense, great idea. Feel free to create a GitHub issue and open a PR.
I would say that if you can add a demo video (and some tests) to that contribution it makes the core team’s review a bit easier 🙂
Shadrack Gichana —
Thanks Joey! Appreciate the warm welcome and the guidance.
Will do on both. I'll include a short demo video showing the connector in action (config, test connection, and an incremental sync run) and add unit + integration tests following the patterns I see in the existing connectors.
I'll get the GitHub issue up shortly and tag you once the PR is ready for review. Looking forward to contributing! 🙌
Shadrack Gichana —
Hey Joey i raised an issue about the dropbox connector which I wanted to finish up on then raise then as i submit the pr requeest, but I see someone hijacked the idea and opened an issue. I'm already almost done doing the code and the test have ran successfully. I'm finalizing can the idea be reserved for me. Or can I still submit my pr request?
joey (archestra team) —
ah yes I see what you mean, yes no worries, you brought forward the idea first so I'll review your PR first, no stress 🙏
joey (archestra team) —
(totally forgot about this thread when I saw that github issue)
Shadrack Gichana —
Thanks Joey, I appreciate it. I'll finish up the final touches and submit the PR for your review.
Shadrack Gichana —
Hey Joey, PR is ready for review: github.com/archestra-ai/archestra/pull/3780
Demo video: www.loom.com/share/6acdfc43f854443fad21aeebae8594c5
26 unit tests passing. Note that the full end-to-end sync demo requires the backend running with this branch, happy to provide that if needed.
Thank you.
joey (archestra team) —
will take a look today, thanks! question on this. Will this support subfolder traversal?
If not in its current state, I think we should add it. Our Google Drive connector has logic for doing subfolder discovery/traversal - I think it would be worthwhile extracting that logic out from the google drive connector and making it generic enough such that it could be reused for the dropbox connector
(this would be super helpful as we could then reuse that same logic for sharepoint (see github.com/archestra-ai/archestra/issues/3665))
Shadrack Gichana —
> Hey Joey, just left a summary on the PR on github but wanted to flag here too. I appreciate the feedback, all your review comments are addressed: subfolder traversal with a shared BFS utility, batchSize chunking fix, fileTypes field in the frontend, and migrated to the official Dropbox SDK. CLA is signed. PR is ready for another look when you get a chance: github.com/archestra-ai/archestra/pull/3780
Shadrack Gichana —
Also just to answer the previous question. Yes, subfolder traversal is now fully supported. I extracted the BFS folder traversal logic from the Google Drive connector into a shared `folder-traversal.ts` utility (`traverseFolders` function with a `FolderTraversalAdapter` interface), and wired the Dropbox connector to use it. The traversal is folder-by-folder using `filesListFolder` with `recursive: false` per folder, which matches the Google Drive pattern exactly.
The same utility is ready for SharePoint to use to fix #3665 meaning any connector just needs to implement `listDirectSubfolders` and pass it to `traverseFolders`. Both `recursive` and `maxDepth` are configurable via the connector config.
joey (archestra team) —
last comment I would have would just be around `platform/backend/src/knowledge-base/connectors/folder-traversal.ts` - do you mind also refactoring `backend/src/knowledge-base/connectors/gdrive/gdrive-connector.ts` to use this shared traversal logic? this way that logic is in one spot 🙂
additionally, it is worth adding a note in `docs/pages/platform-adding-knowledge-connectors.md` to mention that when adding a new knowledge connector, if it needs to do recursive subfolder traversal, to prefer using `platform/backend/src/knowledge-base/connectors/folder-traversal.ts`
Shadrack Gichana —
On it. I'll refactor the Google Drive connector to use `traverseFolders` and add the note to the docs. Will push shortly.🫡
Shadrack Gichana —
Hey Joey, just pushed. Google Drive connector now uses `traverseFolders` too, added 9 unit tests for the utility itself, and documented it in the adding connectors guide. All tests are passing.
joey (archestra team) —
any chance you are able to re-upload/share your demo video? 🙂 I see this when trying to watch
Shadrack Gichana —
Hey Joey, pushed fixes for both blockers -- `recursive` and `maxDepth` are now in the Dropbox schema, and the spy assertion in the traversal test is properly wired. Also regenerated the shared types with CODEGEN=true so the frontend picks up the dropbox connector type correctly. One heads up -- there's a pre-existing build error in `sign-out-with-idp-logout.tsx` on `getIdentityProviderIdpLogoutUrl` that's unrelated to this PR. 74 tests passing across the Dropbox connector, Google Drive, and folder traversal utility. Demo video: youtu.be/b05cE0FftAE -- had to switch from Loom as it was rejecting the file. Ready for another look.
Shadrack Gichana —
Hello Joey, pushed the two fixes. Working on the end-to-end demo with Tilt right now -- will share the recording once the environment is up.
Shadrack Gichana —
Hello Joey I'm having trouble with api credits but i'll have sent the video in an hour. Apologies for any inconvenience caused.
joey (archestra team) —
no worries 🙂
Shadrack Gichana —
> Hey Joey, pushed the cursor fix -- full sync now gets a root-scoped cursor after BFS traversal so incremental sync tracks the whole tree correctly. Here's the updated demo video: youtu.be/VK8H1iodAP8 -- shows the connector syncing files from nested Dropbox subfolders and the chat querying the knowledge base.
> Though I wasn’t able to get inline citations cause it needs an OpenAI embedding key which I didn't have available -- the sync and retrieval pipeline is working as shown in the logs and in the video. 34 tests passing.
2 / 2
Shadrack Gichana —
Also though I’m no longer able to comment directly on the pr cause of new changes “An owner of this repository has limited the ability to comment to users that have contributed to this repository in the past.”
joey (archestra team) —
hey 👋 yes we made a small on-boarding change for contributors (we’ve had a massive influx of AI slop contributions), we’ll have a blog post on the topic soon but in the interim you just need to go here and follow the steps and you should be all set!
https://archestra.ai/contributor-onboard
Shadrack Gichana —
Thanks for that. I've gone through the guide and I've followed the instructions. Though on the demo video unfortunately I couldn't get inline citations working because I don't have an OpenAi embedding key set up yet. I'm using Llama locally. But the full pipeline from sync to chunk to retrieval is working as demonstrated. Let me know if the demo looks good or if there's anything you'd like me to adjust.
Shadrack Gichana —
Hello Joey, just from analysing the code again. Just to clarify the cursor approach, after the BFS traversal completes, we call `filesListFolder` with `recursive: true` on the root path to get a single root-scoped cursor. That cursor is what gets saved in the checkpoint. All subsequent incremental syncs use `filesListFolderContinue` with that cursor, which means Dropbox tracks changes across the entire tree from that point forward and not just the last folder walked. The CI failures are pre-existing in main and unrelated to the Dropbox changes. Happy to walk through the cursor logic in more detail if that would help.
Shadrack Gichana —
Hi Joey, glad the PR is merged! I truly appreciate the guidance and feedback throughout the review process. I have some ideas for potential connectors from my analysis of the codebase, will come better prepared next time. Looking forward to the next contribution!
Hey team, my name is Shadrack Gichana, I'm a full-stack developer from Nairobi. I've been going through the codebase today and I'd love to work on adding a Dropbox knowledge connector. I went through the existing connectors like Notion, SharePoint, GitHub and from my understanding the pattern is validateConfig, testConnection, incremental sync with a cursor-based checkpoint, frontend fields, and registry wiring. The Dropbox API v2 maps cleanly onto that same structure. I'll open a proposal issue on GitHub first if the team is open to it.
Hello everyone, my name is Shanu Kumawat, I am a Dart SDK contributor.
I found out about this project while looking for a secure MCP solution for one of my side projects. its such a great thing you guys are building here, a security layer for MCP is exactly what ecosystems need right now. I am looking forward to contributing to this project.
Hey team, my name is Shadrack Gichana, I'm a full-stack developer from Nairobi. I've been going through the codebase today and I'd love to work on adding a Dropbox knowledge connector. I went through the existing connectors like Notion, SharePoint, GitHub and from my understanding the pattern is validateConfig, testConnection, incremental sync with a cursor-based checkpoint, frontend fields, and registry wiring. The Dropbox API v2 maps cleanly onto that same structure. I'll open a proposal issue on GitHub first if the team is open to it.
Thanks Joey! Appreciate the warm welcome and the guidance.
Will do on both. I'll include a short demo video showing the connector in action (config, test connection, and an incremental sync run) and add unit + integration tests following the patterns I see in the existing connectors.
I'll get the GitHub issue up shortly and tag you once the PR is ready for review. Looking forward to contributing! 🙌
Hey Joey i raised an issue about the dropbox connector which I wanted to finish up on then raise then as i submit the pr requeest, but I see someone hijacked the idea and opened an issue. I'm already almost done doing the code and the test have ran successfully. I'm finalizing can the idea be reserved for me. Or can I still submit my pr request?
will take a look today, thanks! question on this. Will this support subfolder traversal?
If not in its current state, I think we should add it. Our Google Drive connector has logic for doing subfolder discovery/traversal - I think it would be worthwhile extracting that logic out from the google drive connector and making it generic enough such that it could be reused for the dropbox connector
Hey Joey, just left a summary on the PR on github but wanted to flag here too. I appreciate the feedback, all your review comments are addressed: subfolder traversal with a shared BFS utility, batchSize chunking fix, fileTypes field in the frontend, and migrated to the official Dropbox SDK. CLA is signed. PR is ready for another look when you get a chance: https://github.com/archestra-ai/archestra/pull/3780
Also just to answer the previous question. Yes, subfolder traversal is now fully supported. I extracted the BFS folder traversal logic from the Google Drive connector into a shared folder-traversal.ts utility (traverseFolders function with a FolderTraversalAdapter interface), and wired the Dropbox connector to use it. The traversal is folder-by-folder using filesListFolder with recursive: false per folder, which matches the Google Drive pattern exactly.
The same utility is ready for SharePoint to use to fix #3665 meaning any connector just needs to implement listDirectSubfolders and pass it to traverseFolders. Both recursive and maxDepth are configurable via the connector config.
last comment I would have would just be around platform/backend/src/knowledge-base/connectors/folder-traversal.ts - do you mind also refactoring backend/src/knowledge-base/connectors/gdrive/gdrive-connector.ts to use this shared traversal logic? this way that logic is in one spot 🙂
additionally, it is worth adding a note in docs/pages/platform-adding-knowledge-connectors.md to mention that when adding a new knowledge connector, if it needs to do recursive subfolder traversal, to prefer using platform/backend/src/knowledge-base/connectors/folder-traversal.ts
Hey Joey, just pushed. Google Drive connector now uses traverseFolders too, added 9 unit tests for the utility itself, and documented it in the adding connectors guide. All tests are passing.
Hey Joey, pushed fixes for both blockers -- recursive and maxDepth are now in the Dropbox schema, and the spy assertion in the traversal test is properly wired. Also regenerated the shared types with CODEGEN=true so the frontend picks up the dropbox connector type correctly. One heads up -- there's a pre-existing build error in sign-out-with-idp-logout.tsx on getIdentityProviderIdpLogoutUrl that's unrelated to this PR. 74 tests passing across the Dropbox connector, Google Drive, and folder traversal utility. Demo video: https://youtu.be/b05cE0FftAE -- had to switch from Loom as it was rejecting the file. Ready for another look.
Hey Joey, pushed the cursor fix -- full sync now gets a root-scoped cursor after BFS traversal so incremental sync tracks the whole tree correctly. Here's the updated demo video: https://youtu.be/VK8H1iodAP8 -- shows the connector syncing files from nested Dropbox subfolders and the chat querying the knowledge base.
Though I wasn’t able to get inline citations cause it needs an OpenAI embedding key which I didn't have available -- the sync and retrieval pipeline is working as shown in the logs and in the video. 34 tests passing.
Also though I’m no longer able to comment directly on the pr cause of new changes “An owner of this repository has limited the ability to comment to users that have contributed to this repository in the past.”
hey 👋 yes we made a small on-boarding change for contributors (we’ve had a massive influx of AI slop contributions), we’ll have a blog post on the topic soon but in the interim you just need to go here and follow the steps and you should be all set!
Thanks for that. I've gone through the guide and I've followed the instructions. Though on the demo video unfortunately I couldn't get inline citations working because I don't have an OpenAi embedding key set up yet. I'm using Llama locally. But the full pipeline from sync to chunk to retrieval is working as demonstrated. Let me know if the demo looks good or if there's anything you'd like me to adjust.
Hello Joey, just from analysing the code again. Just to clarify the cursor approach, after the BFS traversal completes, we call filesListFolder with recursive: true on the root path to get a single root-scoped cursor. That cursor is what gets saved in the checkpoint. All subsequent incremental syncs use filesListFolderContinue with that cursor, which means Dropbox tracks changes across the entire tree from that point forward and not just the last folder walked. The CI failures are pre-existing in main and unrelated to the Dropbox changes. Happy to walk through the cursor logic in more detail if that would help.
Hi Joey, glad the PR is merged! I truly appreciate the guidance and feedback throughout the review process. I have some ideas for potential connectors from my analysis of the codebase, will come better prepared next time. Looking forward to the next contribution!